CENTRAL LIBRARY 
Birla Institute of Technology & Science 
PILANI (Rajasthan) 


Call No. 


Aecetsion No. 






PRENTICE-HALL PSYCHOLOGY "SERIES 
P. A. MOSS, Ph.D., M.D., EDITOR 


Measurement 

in 

Psychology 



PRENTICE-HALL PSYCHOLOGY SERIES 
P. A. Moss, Ph.D., M.D., Editor 

General Psychology, by Floyd C. Dockeray, Ph.D. 

Student's Guide for Beginning the Study of Psychol- 
ogy, by Valentine, Taylor, Baker, and Stanton. 

Foundations of Abnormal Psychology, by Fred A. Moss, 
Ph.D., M.D., and Thelma Hunt, Ph.D., M.D. 

A Work Book in Educational Psychology, by Harvey C. 
Lehman, Ph.D., and Stuart M. Stoke, Ed.D. 

Child Psychology, by Arthur T. Jersild, Ph.D. 

The Psychology of Adolescence, by Karl C. Garrison, 
Ph.D. 

Social Psychology, by Abraham Myerson, M.D. 

The Principles and Methods of Vocational Choice, by 
Maurice J. Neuberg, Ph.D. 

Comparative Psychology, by Thorndike, Waters, Stone, 
Moss, Purdy, Fields, Franz, Liddell, Heron, Tolman, 
Tryon, *and Tinklepaugh. 

Study Outline for General Psychology, by S. L. Craw- 
ley, Ph.D. 

Studying Efficiently, by S. L. Crawley, Ph.D. 

Personality, by Harold V. Gaskill, Ph.D. 

Legal Psychology, by Harold E. Burtt, Ph.D. 

Introduction to Experimental Psychology, by Paul F. 
Finner, Ph.D. 

Educational Psychology, by Skinner, McConnell, Lawther, 
Hartmann, Gray, Fletcher, Thomson, Jersild, Powers, 
Boynton, Conklin, Davis, Webb, Garrison (K.C.), Free- 
man, Witty, Lincoln, Wood, Gifford, Rock, Wallin, Moss, 
Trabue, Aleck, and Washbume. 

Fundamentals of Psychology in Secondary Education, 
by S. C. and K. C. Garrison. 

Measurement in Psychology, by Thelma Hunt, Ph.D., 
M.D. 

Educational Psychology, by Thomas R. Garth, Ph.D. 

The Two Sciences of Psychology, by Arthur D. Fearon, 
Ph.D. 



c 


Measurement 

in 

Psychology 

Thelma Hunt, Ph.D., M.D. 

Asshtant Professor of Psychology 
The George Washington University 


New York 

PRENTICE-HALL, INC. 

1937 



Copyright, 1936, by 
PRENTICE-HALL, INC. 

ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY 
BE REPRODUCED IN ANY FORM, BY MIMEOGRAPH OR ANY 
OTHER MEANS, WITHOUT PERMISSION IN WRITING FROM 
THE PUBLISHERS. 

First Printing October, 1936 

Second Pjinting June, 1937 


PRINTED IN TllE UNITED STATES OP AMERICA 



TO MY VARIOUS TEACHERS AND INSTRUCTORS 


WHO HAVE BEEN RESPONSIBLE FOR THE BACKGROUND 
OP TRAINING AND EXPERIENCE WHICH MAKE 
POSSIBLE THE WRITING OF THIS BOOK 




Preface 


T his book is designed primarily as a textbook for 
college courses in psychological tests and measure- 
ments. The aim has been to give a brief survey of the 
whole field of psychological testing. In this respect the 
book will be found to differ from the several books which 
treat only one phase of psychological measurement, as 
the intelligence, the educational, or some other type of 
testing. 

A basic theme of the book is that quantitative study 
and measurement are just as pertinent to psychological 
pursuits as to other scientific pursuits; and that progress 
in psychology will generally be in proportion to the use 
of objective quantitative methods of study. 

It is difficult to give credit to all those who have as- 
sisted in the preparation of this book. I wish to express 
my appreciation to the writers and publishers who have 
permitted me to quote from their works. Specific ac- 
knowledgment is made to them at appropriate places in 
the text. I am indebted to my students in tests and 
measurements, not only for assistance in collecting facts 
and information, but also for their patience and interest 
in connection with my try-out, in teaching, of most of 
the material of the book. Dr. Fred A. Moss has con- 
tributed much through his guidance and encouragement 
during the writing of the book. I am under particular 
obligation to Dr. Katharine Omwake, of Agnes Scott Col- 
lege, who read the whole manuscript and made many 
valuable suggestions; and to Dr. L. J. O’Rourke, of the 
United States Civil Service Commission, who furnished 

suggestions and assistance in the preparation of the 

vii 



Preface 


viu 

chapters dealing with tests in the field of personnel 
work. Finally, I wish to thank Miss Margaret Telford 
and Mrs. Sarah Menzer for assistance in the prepara- 
tion of the manuscript. 

Thelma Hunt 

The George Washington University 



Contents 


Past I 

THE PLACE OF MEASUREMENT IN 
PSYCHOLOGY 

PAGE ^ 

Pbeface vii 

CHAPTER 

I. The Value of Measurement in Psychology 3 

What is measurement? 4 

Measurement in everyday life 5 

Origin and development of measurement . 6 

Value of measurement 7 

Problems of measurement in psychology . 11 

II. The Instruments of Measurement in Psy- 

chology 13 

Psychological tests 14 

Rating scales 16 

Questionnaires 17 

Raw measures and relative measures ... 18 

Reliability and validity 19 

III. The History of Measurement in Psychol- 

ogy 23 

The background of intelligence test devel- 
opment — ^the pre-Binet period .... 25 

The Binet period 28 

Preliminary period of group mental tests . 32 

The World War period— the Army tests . 34 

The post-War period— application of Army 
tests and development of similar tests 38 

The maturing and stabilizing period of 

1921-1926 . . . . ; 34 

The present period 41 

ix 



X 


Contents 


Paet II 

MEASUREMENT OP INTELLECTUAL QUALITIES 

lAPTEB PAGE 

IV. The Measurement of Mental Deficiency 59 

What is mental deficiency? 59 

Degrees and types of mental deficiency - . 61 

Why measure mental deficiency? .... 62 

Types of tests suitable for measuring men- 
tal deficiency 63 

V. The Measurement of Superiority .... 82 

Development of interest in the superior . 85 

The superior school child 91 

Tests suitable for testing school groups . 95 

Mental tests for the college level .... 104 

Measuring the superiority of adults outside 
of college 107 

VI. Intellectual Measurement of the Insane 109 

General mental tests 110 

Memory tests 114 

Association tests 115 

VII. The Uses of Mental Tests 118 

Uses in college 119 

Uses in schools below the college level . 123 

Uses of mental tests in vocational selection 124 
Uses of mental tests with delinquents and 

criminals 133 

The use of mental tests in various theoret- 
ical and research problems 136 

Part III 


MEASUREMENT OF APTITUDES 


VIII. The Measurement op Special Talents . . 143 

The measurement of musical aptitude . . 144 

Measurement in the field of art ... . 153 



CHAPTER 

IX. 


X. 


XI. 


XII, 


XIII. 


Contents 


The Measurement of Mechanical Aptitude 

The Stenquist Mechanical Aptitude Tests . 
The Minnesota Mechanical Ability Tests . 
The MacQuarrie Test for Mechanical 

Ability 

The O’Rourke Mechanical Aptitude Test . 

The Measurement of Aptitude for Voca- 
tions AND Professions 

The problem 

The development of the aptitude test . . 

The nature of the aptitude test . . . . 

The validity of the test 

The Measurement of Interests . . . . 

The methods of measuring interests . . . 


Part IV 

MEASUREMENT OF ACHIEVEMENT 

The Measurement of Achievement in 
Schools 

What arc tests of achievement? . . . . 

Psychology’s contribution to achievement 

testing 

Examples of achievement tests for schools . 
The nature of short-answer achievement 
tests 

Objective Achievement Tests in Profes- 
sional Schools 

The difBculties encountered 

Early studies in achievement testing in pro- 
fessional schools 

An objective bacteriology test 

Study and standardization of the tests in 
bacteriology 


xi 

FAOK 

162 

164 

167 

169 

171 

172 

174 

174 

175 
180 

187 

189 


207 

207 

208 
213 

223 

227 

227 

228 
229 

236 



Contents 


CHAPTER 

XIV. The Measurement of Job Efficiencies . . 

A service rating scale 

Objective psychological tests in the meas- 
urement of job eflBciencies 


Part V 

MEASUREMENT IN INDUSTRIAL AND 
PERSONNEL FIELDS 

XV. Historical Background op Psychological 
Measurement in Industry' 

The foundations of psychology applied to 

personnel problems 

Early experiments in industrial psychology 

The effect of the World War 

History of the testing of applicants . . . 

XVI. Constructing a “Psychological” Test for 
Employment 

Analysis of the job 

Selection of types of tests and questions . 

Length of test 

Construction of questions 

Analysis of test material from preliminary 

trial 

Final arrangement of test 

Final application of test and establishment 
of critical scores 

XVII. Measurement in the Selection of Em- 
ployees 

The purpose of measurement in selection . 
Problems related to measurement in the se- 
lection of employees 

Measurement in the selection of employees 
illustrated 


PAQB 

240 

241 

250 


267 


257 

262 

265 

266 


273 

275 

276 

277 

277 

278 

279 

279 

280 
280 

281 

286 



Contents xiii 

CHAPTER PAGE 

XVIII. Psychological Measurement in the Con- 
trol OF Employees 303 

Example of measurement in relation to ac- 
cident prevention 306 

Time and motion study 309 

Measurement in department store manage- 
ment 311 

Part VI 

MEASUREMENT OF THE MORE GENERAL 
TRAITS OF PERSONALITY 

XIX. The Nature of Measurement of Person- 
ality 321 

The difficulties of measuring personality 

traits 322 

Methods of personality testing 325 

XX. The Measurement of Social Attributes . 334 

The Moss-Hunt-Omwake Social Intelli- 
gence Test 335 

The Vineland Social Maturity Scale . . . 351 

XXL The Measurement of Emotions by Verbal 

Tests 357 

Adjustment questionnaires and tests . . . 358 
Extroversion-introversion scales .... 362 

An annoyance test 365 

The Pressey X-0 test 367 

XXII. The Measurement of Character and the 

Moral Sense 372 

The tests of conduct knowledge and judg- 
ment 375 

Behavior tests 380 

Other types of tests 386 



XIV 


Contents 
Pabt VII 


PHYSIOLOGICAL MEASUREMENTS IN 
PSYCHOLOGY 

13HAPTBH 

XXIII. The Measurement of Fatigue 393 

Blood pressure 398 

Carbon-dioxide combining power of the 

blood 399 

Blood sugar 399 

Metabolism 400 

Blood cell studies 400 

XXIV. Laboratory Tests in Mental Disorders . 403 

Blood tests 404 

Spinal fluid tests 406 

Basal metabolism 408 

Blood pressure 409 

XXV. Glandular Function Tests 410 

Nature of the endocrine glands and their 

importance to psychology 410 

Measurement in relation to endocrine func- 
tion 416 

XXVI. Physiological Measurements of Emotion . 424 

The nature of physiological measurements 

of emotions 425 

Blood pressure measurements of emotion . 428 

Measurement of breathing 431 

Measurement of digestive-system activity . 432 
Psj'^cliogalvanic reflex measurements . . . 433 
Emotion indicated by measurement of ad- 
renalin in bloodstream 434 

Physiological measurement of emotion in 

detection of lying 436 

Summary 440 



Contents 


XV 


CHAPTEn PAGE 

XXVII. Tests in the Motob and Sensory Fields . 441 

Cephalic index 442 

The strength tests 443 

Tests for sensory acuity 446 

Color blindness 448 

Steadiness tests 448 

Reaction time 452 




List of Figures 


PAGE 

1. Tests in the Pintner-Paterson Performance Scale . 74 

2. Test One — The International Intelligence Test . 78 

3. Porteus Mazes 79 

4. The Distribution of Intelligence 92 

6. Diagrammatic Representation of the Selective 

Process in Determination of Intelligence of School 
Groups 94 

6. Mental Growth Curve 137 

7. Samples from the Meier-Seashore Art Judgment 

Test 156 

8. Samples from the McAdory Art Test . . . 158-160 

9. Sample from the Stenquist Mechanical Aptitude 

Test 167 

10. Minnesota Paper Form Board 168 

11. Sample Questions from the O^Rourke Mechanical 

Aptitude Test 170 

12. Distribution of Medical School Grades According 

to Medical Aptitude Test Scores 181 

13. Distribution of Interne Ratings According to Med- 
ical Aptitude Test Scores 183 

14. Comparison of Various Criteria for Admission to 

Medical Schools 184 

15. Distribution of Scores for Interest in Personnel 

Management 194 

16. Samples from the Ayres Handwriting Scale ... 221 

17. Distribution of Probst Service Ratings .... 247 

18. Chart Showing Reliability of Probst Service Rat- 
ings 248 

19. Sample Card from Munsterberg's Test for Street- 

Car Motormen • 264 

20. Relation Between Test Scores and Efficiency of 

Mail Distributors 293 

21. Improved Selection Made by New Postal Examina- 
tion 294 

xvtt 



xviii List of Figures 


22. Circles Test for Dishonesty 

23. The Relationship Between Production and Feelings 

of Tiredness 

24. Blood Pressure Curves in Fatigue 

25. Metabolism Records Before and After Fatigue . . 

26. Physiological Changes Produced by Fatigue . . . 

27. Colloidal Gold Curves in Various Mental Disorders 

28. Technique of Blood OEstrin Determination . . . 

29. Blood (Estrin Content at Various Stages of the Sex. 

Cycle 

30. Diagram Showing Method Used to Record Stomach 

Contractions 

31. Record Indicating Inhibitory Effect of Adrenalin 

on Intestinal Muscle Contraction 

32. Head Calipers 

33. Dynamometer for Measuring Hand Strength , . 

34. Instrument for Testing Hand Steadiness .... 

35. Wabblemeter for Measuring General Bodily Steadi- 
ness 

36. Diurnal Variations in Steadiness 

37. Moss-Brown Reaction Time Instrument . . . . 


FAOB 

382 

395 

398 

401 

402 
407 

421 

422 

427 

435 

442 

444 

449 

450 
452 
455 



List of Tables 


PAQB 


1. Army Test Ratings 37 

11. Equivalent Mental Ages and I. Q.’s for Low 
Army Alpha Scores 81 

III. Grade Norms (Average) for Scale A, Na- 
tional Intelligence Test 97 

IV. Mental Age Equivalents for National Intel- 
ligence Test Scores 98 

V. Norms for International Intelligence Test . 101 

VI. Free Association Test (Kent-Rosanoff) . . 116 

VII. Responses of Various Groups to Kent- 

Rosanoff Association Test 117 

VIII. Uses Made of Mental Tests Among Staff 

Members at Purdue University 122 

IX. Employment Methods Used by States . . 126 

X. Intelligence Limits for Various Occupations 128 
XL Validity of Various Employment Methods . 131 


XII. The Proportion of Successes and Failures at 

Each Mental Age Level for 413 Packing Jobs 132 

XIII. Percentage of Various Criminals Who Arc 

Mentally Inferior 135 

XIV. Measures of the Variations in the I. Q. on 
Retesting as Found in Several Typical Studies 138 

XV. Reliability Coefficients for Seashore Tests . 149 
XVI. Validity Coefficients for Seashore Tests . . 152 
XVII. Norms on the McAdory Art Test .... 161 
XVIII. Correlations between Shop-Teacher Ranks 


and Series I Stenquist Tests 165 

XIX. Correlations between Stenquist Assembly 

Tests and General Intelligence 166 

XX. Per Cent Marks Assigned to a Physiology 

Examination 210 

XXI. Records for Bacteriology Test Study ... 238 

XXII. Norms on Bacteriology Test 239 

xix 



XX 


List of Tables 


XXIII. Comparison of Efficiencies of Instructors . . 262 

XXIV. Relationship of Social Intelligence Test 

Scores to Number of Extra-curricular Ac- 
tivities 347 

XXV. Relationship of Social Intelligence Test 

Scores to Activity Scores 348 

XXVI. Correlations between Social and Abstract In- 
telligence 349 

XXVII. Median Social Intelligence Test Scores for 

Various Occupational Groups 351 

XXVIII. Effects on Respiration of Fear Produced by 

Falling 431 

XXIX. Norms for Strength of Grip (in Kilograms) 445 

XXX. Correlations between Intelligence and Reac- 
tion Time 456 



Part I 

THE PLACE OF MEASUREMENT IN PSYCHOLOGY 




CHAPTER I 


The Value of Measurement in Psychology 

W E ARE to concern ourselves throughout this text 
with measurement as it is applied to psychology. 
We are to examine the measuring tools which the psy- 
chologist employs. The aim of the text might be con- 
sidered as twofold: to show the importance to psy- 
cshology of exact measurement; and to familiarize the 
student with the types of psychological measurement 
that have been developed. 

The text will cover more than intelligence and achieve- 
ment tests; it will cover measurement in its widest vari- 
ations. In regard to mental qualities, we shall consider 
not only the ordinarily discussed intelligence tests, but 
also the methods which have been developed for measur- 
ing special aptitudes and various traits of personality. 
In the field of achievement testing, we shall consider the 
tests for measuring achievement in business and industry 
as well as achievement in the schools. And finally, where 
they have been found particularly useful in psycholog- 
ical problems, we shall consider briefly tests borrowed 
from other sciences. 

The method of discussion will consist in designating 
the several varieties of measurement in psychology, and 
in giving examples of the measuring devices which have 
been developed for them. In many instances, ob- 
viously, it will be impossible to mention all the means 
of measurement that have been developed; then we shall 

3 



4 Value of Measurement in Psychology 

attempt to select those that are the more important or 
the more representative. 

I. What is Measurement? 

Things are measured when they are expressed quanti- 
tatively; or, we may say that a thing is measured when 
we speak of it in terms of "how much.” To make clear 
what we mean by psychological measurement, let us ex- 
amine a few instances. 

Intelligence is measured when we can describe the 
amount of this quality which a person has. It is in- 
exactly measured when we have means of indicating 
whether a person possesses genius intelligence, superior 
intelligence, average intelligence, subnormal intelligence, 
or intelligence at the feebleminded level. It is more ac- 
curately measured when we have a measuring rod which 
enables us to designate intelligence by an Intelligence 
Quotient of 140, of 128, of 98, or some other definite 
quantitative amount. 

To take another example, achievement is measured 
when we indicate by some measuring device just exactly 
the amount of achievement, or the degree of achieve- 
ment, which a person has reached. This we may indicate 
by stating the number of problems he can solve or the 
number of correct judgments he can make in a given 
field. By other measuring devices the achievement may 
be stated in terms of how the individual compares in 
achievement with other individuals of similar age, of 
similar grade in school, or of similar occupation. 

To mention one more example, emotional reactions 
are measured when we state the degree of an emotion 
that a person shows, or when we state how he compares 
in his emotional reactions with others. In the sphere of 
emotions our measurements are as yet very rough and 



Value of Measurement in Psychology 


5 


imperfect. In those of intelligence and achievement 
our methods have been refined to a fair degree of 
accuracy. 


II. Measurement in Everyday Life 

Since psychology observes man in all his activities and 
relationships, we may point out at the beginning the im- 
portance of measurement in man’s everyday life. Meas- 
urement is always with us. There are no human ac- 
tivities that we can think of in which measurement does 
not enter in some way. At birth the child is measured 
for weight and other elements of physiological make-up 
that may indicate his possibilities for future health and 
life. At death a person is measured for length, so that 
he may be placed in a casket of proper size. After death, 
two dates are carved on his tombstone to indicate the 
measure of his life span. 

In his daily activities man is continually guided by 
measurements. His whole day is planned by a device for 
measuring time. All his financial activities are carried 
on in terms of money, a measure of economic exchange. 
His travel is planned according to the measurements of 
distances. His illnesses are cared for by medicine given 
in measured doses. His bad eyesight is cured by a glass 
with a measured curve to fit a measured defect of his 
eye. Thus we are continually measuring our environ- 
ment in order to make the most of it. Many more 
examples of measurement in man’s everyday life might 
be put down, but these will serve to indicate its great 
importance. 

The scope of measurement is almost without limit. 
Limits in size extend from the measurement of the larg- 
est planets in our solar system to the measurement of 
the smallest invisible constituents of matter. With ref- 



6 Value of Measurement in Psychology 

erence to nearness and remoteness, the measurements 
may be as near as the structures of our own body, or as 
remote as the farthest star in the heavens. In types of 
measuring devices, the “yardsticks” which are used, 
there is a variety too great to be discussed here. The 
type itself may be as definite as the familiar foot-rule, 
or it may be as subjective as a personal rating made by 
one person about another. 

III. Origin and Development of Measurement 

The origin of measurement goes back farther than 
historical records. In the most ancient ruins of build- 
ings we find evidence of standards of measurement. 
Measurement by such standards may not have been 
measurement in the modern sense, but the ruins show 
that ancient buildings were erected according to some 
regular unit. In many cases, if not in all, the ancient 
units seem to have been in terms of some part of the 
body. The “foot,” it is thought, first appeared in 
Greece, and the standard was said traditionally to have 
been derived from the foot of Hercules. Tradition has 
it that Charlemagne later established the length of his 
own foot as the standard of measurement for his 
country. 

There is an ancient story of a poor man who, in a 
thoughtful mood, asked of a wise man, “Why am I 
poor?” The wise man cut a staff as high as his thigh, 
made notches in it which were his hand’s width apart, 
gave it to the poor man, and said, “I give you a sceptre 
of success, a measuring stick; measures rule the world. 
They go in pairs — the measure of the sandal must 
match the measure of the foot; so all things are made to 
measure. With this stick measure what you make; 
measure well for use. Three loops of cord make it a 



Value of Measurement in Psychology 7 

balance to weigh what you buy or sell. Set it upright 
in the sun, and the stick will measure the shadow hours 
of time — allot then thy tasks. Attune thy life to its 
circling shadows; when in spring the noon shadows grow 
long, it is time to plant. Measure your portion and your 
neighbor’s. Make wisely, measure truly, and trade 
justly, and you will prosper.” 

The development of measurement in psychology has 
followed very closely the history of measurement in the 
other sciences. In most of the sciences we find a stage 
in which there is much philosophy, but little measure- 
ment. We may mention alchemy and its relation to 
chemistry, or astrology and its relation to astronomy. 
These two forerunners preceded the development of 
the two natural sciences and the use of exact measure- 
ment. Alchemy and astrology existed at times when 
very few of the factors involved in the sciences were 
known. Phenomena were interpreted subjectively, in 
terms of vague indications. Development of more ex- 
act measurement now gives us the precise, almost in- 
fallible, predictions of the science of chemistry or of 
astronomy. When completed, the history of measure- 
ment in psychology may read very much like the 
history of measurement in such sciences as these. A 
difference at the present time lies in the fact that psy- 
chology has not yet progressed so far beyond subjective, 
introspective types of measurement. In a later chap- 
ter, we shall attempt to follow in more detail the his- 
tory of psychological measurement from the beginning 
of the development of definite standards. 

IV. Value of Measurement 

Progress in any line depends on ability to measure 
exactly. The steam engine could not be created until 



8 Value of Measurement in Psychology 

man was able to make a piston and cylinder of such ex- 
act dimensional relationships that although little steam 
could escape between them there would remain suffi- 
cient room for the piston to move up and down. The 
automobile had to wait until man could measure to one 
five-thousandth of an inch. Owing to lack of accurate 
measuring devices, medicine remained largely specula- 
tive almost until modern times; it is only within the 
last hundred years that medical literature has exhibited 
anything like scientific accuracy. Exactness of measure- 
ment is the best index of the development of a 
profession. 

We may indicate the importance of measurement to 
progress in psychology by several quotations from men 
who have contributed much to it : 

The history of science is still written chiefly in terms 
of man’s struggle to know and master the world of 
nature about him. It is a story more dramatic and 
more significant in the history of the race than the story 
of sieges and lost battles and diplomatic victories. But 
significant as the achievement of man may be in the 
conquest of force and matter, of air and land and water, 
its telling may, in the years to come, give way before 
a new epic, the story of man’s study and growing 
knowledge of himself as an individual and in the mass. 

Of this new story, only a few chapters have as yet been 
written, in comparison with what may still be before us 
to know and to write. 

For such a study new norms and new techniques 
must be derived and applied. They are already being 
set up and, as in all sound scientific progress, they are 
built upon the disciplines and the knowledge that the 
world of scholarship has laid at the basis of past prog- 
ress. And as in the past, all fields of exact knowledge 
or bold scientific theory were based on measurement, 
that is, upon mathematics, so in the study of himself 
and his kind, man seeks the aid of mathematics and 



Value of Measurement in Psychology 9 

develops bio-chemistry as the handmaid of his newer 
studies of himself and of all animate things.^ 

Science, as it develops, advances through the stages 
of observation, description, and experimentation to the 
stages of measurement and calculation. Our object is 
to consider the methods, and in some small part, the 
accomplishments of those students of mankind who 
have studied human individuals for the purpose of 
basing broad and permanently valuable mathematical 
generalizations on masses of measurements.^ 

Psychology had to await the development of the 
exact and the natural sciences, whose objects are more 
open to measurement, whose contents are more basic, 
and whose applications are more useful. And it 
should be remembered that all the sciences, as we now 
know them, are comparatively new. The doctrine of 
the conservation of energy is only about as old as 
Professor Stumph, the theory of evolution by natural 
selection about as old as Professor Jastrow. Modern 
physics and modern genetics are no older than the 
younger members of this congress. When sciences of 
earlier origin have made such notable advances during 
the lifetime of those now living, we may look forward 
with hopefulness to a corresponding development of 
psychology within the lifetime of our children.® 

Psychology cannot attain the certainty and exact- 
ness of the physical sciences, unless it rests on a founda- 
tion of experiment and measurement.^ 

There are able psychologists who like to narrate what 
they think they think, what they feel they feel, what 

1 Harris, J. A., Jackson, C. M., Paterson, D. G., Scammon, R. E.: The 
Measurement of Man, The University of Minnesota Press, Minneapolis, 
1930, p. V. 

2 Ibid., p. 6. 

® Cattell, J. McKeen, Psychology in America, the Science Press, New 
York, 1929, p. 2. 

^Ibid. (quoting from article on “Mental Tests and Measurements,” 
Mind, No. 69, 1890), p. 30. 



10 Value of Measurement in Psychology 

they imagine they imagine. Those of us who are con- 
cerned with quantitative measurements and objective 
results . . . think . . . that such literary diversions 
contribute about as much to a science of psychology 
as similar stories about their rheumatism and other 
bodily ailments would contribute to a science of 
pathology.® 

In psychology, measurement may be considered the 
master key by which we secure understanding, predic- 
tion, and control of behavior. We understand the scho- 
lastic failure of the high-school boy by measuring his 
intelligence; we understand the emotional, high-strung 
behavior of the hyperthyroid by measuring his meta- 
bolic rate ; or we understand the automobile accident by 
measuring the reaction time of the driver. The predic- 
tion of behavior through measurement is illustrated by 
the determination of possible college success through 
scholastic ability tests, and the determination of future 
vocational success by vocational aptitude tests. Control 
of behavior is made easier by measurement, because 
through it we have a more complete knowledge of those 
things which we are to control, both in terms of the hu- 
man factor and in terms of environment which may in- 
fluence the human factor. 

The value of measurement to psychology is indicated 
also by the part it has played in the development of the 
various divisions of psychology. Experimental psychol- 
ogy, for example, rests upon the ability to measure psy- 
chological factors, quantitatively, and it had its origin 
in the adoption by psychologists of measurements first 
borrowed from physics. The whole field of psycholog- 
ical testing is in itself the field of measuring human 


^Jbid, (quoting address of the President of the American Association 
for the Advancement of Science, Jan. 8, 1926), p. 32. 



Value of Measurement in Psychology 11 


qualities. Industrial psychology has depended for its 
success upon the quantitative measurement of voca- 
tional aptitudes, and the quantitative study of factors 
important in the control of employees. Child psychol- 
ogy has depended upon our extension of experimental 
and quantitative methods to the study of behavior of 
even the youngest infant. These various spheres will 
be discussed in more detail as the various types of meas- 
urement are taken up in the book. They are mentioned 
here simply to indicate the value. of measurement to the 
whole of psychological progress. As in other scientific 
fields, measurements in this one are an index of scientific 
status, of the accuracy and exactness with which we do 
things. 


V. Problems of Measurement in Psychology 

Thorndike once made a statement that whatever ex- 
ists at all exists in some amount, and whatever exists in 
amount can be measured. This would lead us to the 
conclusion that all the realities with which we deal in 
psychology are at least potentially capable of measure- 
ment. Why, then, has measurement in psychology 
progressed so slowly? The answer to this question is 
many-sided and has many contributing factors. One 
contributing element lies in the complexity of the things 
to be measured; and still another lies in the attitude of 
people themselves toward allowing the qualities of man- 
kind to be measured. Finally, let us quote one more 
paragraph from the address of J. McKeen Cattell, a pio- 
neer in the field of psychological measurement: 

It may at first sight seem surprising that the sciences, 
both curious and useful, of matter and energy should 
have had an earlier origin and a more systematic de- 
velopment than the biological sciences, while psychol- 



12 Value of Measurement in Psychology 


ogy is only now taking its place among the descriptive 
sciences and has witnessed but its first beginnings as an 
applied science. The explanation is partly in the dif- 
ference in stability and complexity of the objects of the 
different sciences. Matter is plastic to experiment and 
measurement; human behavior eludes experimental and 
quantitative methods. The motions of the solar sys- 
tem since its beginning are less complicated than the 
play of a child for a day. It is also the case that in 
the architectonics of science the mathematical and 
physical sciences are fundamental. Morphology and 
physiology are based upon physics and chemistry; 
psychology on all these sciences. The foundations 
must be laid before we can build the upper stories, in 
which we may prefer to live and from which there may 
be a wider outlook.® 


® Cattell, J. McKeen, op, cit,, p. 1. 



CHAPTER II 


The Instruments of Measurement 
in Psychology 

M easurement has already been defined as a 
means of indicating something quantitatively. 
In order to be able to indicate something quantitatively, 
we must have some measuring standard or yardstick, 
as it were, which is generally acceptable. Practically 
all measuring units in the physical sciences represent 
standards of length, weight, or whatever else is being 
measured; these standards are uniformly the same 
wherever and whenever used. We can readily see, 
then, that some such units must be established if we 
are to measure things in the field of psychology. In 
some instances the psychologist has simply adopted al- 
ready established units from the physical sciences. He 
may, for example, be interested in weight measurement 
as indicative of physical growth in the child. For this 
he borrows the standard unit of mass or weight, and 
compares with it the mass or weight of the child which 
he may have in question. If he is interested in measur- 
ing strength, again he borrows a standard from the phys- 
ical sciences. In measuring mental, emotional, and per- 
sonality characteristics, the psychologist had to invent 
units of his own, but in the building up of these new 
types of units the same principles, in general, hold. The 
unit still represents a standard with which the thing to 

IS 



14 Instruments of Measurement 

be measured is compared, either directly or indirectly. 
For many of the psychological qualities, the compari- 
sons must be indirect rather than direct, as we shall see 
when we discuss the units in more detail. 

It seems well, before a discussion of specific psycho- 
logical tests, to outline the general types of measure- 
ment or of units that have been developed. Three main 
types of measuring devices seem to belong distinctly to 
the psychologist: (1) psychological tests; (2) rating 
scales; and (3) questionnaires. 

I. Psychological Tests 

By the term “test” we shall mean here a means of 
measuring which involves solving a problem or perform- 
ing a task with an accomplishment to show for it. The 
degree or amount of the trait being measured is in- 
dicated by the degree of the accomplishment. The 
accomplishment itself may be graded as to amount 
accomplished, quality or correctness, or time taken to 
attain it. In any event, the subject does something 
and his accomplishment shows the measurement of the 
trait in question. 

Most of these measurements are indirect; that is, they 
directly measure not the trait or ability, but the results 
of this trait or ability in action. In other words, in the 
intelligence test, for instance, intelligence is measured by 
the psychologist not directly but by the accuracy and 
amount of performance on something which demands the 
use of intelligence. We do not know in most instances 
how to make a direct approach, and even in those in which 
directness might seem possible it may be impracticable. 
We may believe, for instance, that intelligence can be 
directly measured in terms of the number of neurones a 
person has, but in a living brain this is obviously an 
impracticable approach to the problem. 



Instruments of Measurement 15 

Some brief examples of psychological tests may make 
the meaning of tests somewhat clearer. 

Intelligence tests are tasks which necessitate the use 
of intelligence in their performance. The individual in- 
telligence test involves the performance of varied tasks 
such, for example, as repeating syllables after an ex- 
aminer has spoken them, or carrying out commands 
which are given orally by the examiner, or copying de- 
signs from a sheet of paper. An ordinary pencil-and- 
paper group intelligence test may involve answering 
questions requiring judgment, working problems in arith- 
metical reasoning, or answering questions about the 
meanings of words. Non-verbal intelligence tests for 
very young or illiterate individuals may involve putting 
together puzzle blocks into a complete picture, or plac- 
ing cut-out designs in a form board, or tracing a con- 
tinuous line through a maze. 

A test in aptitude may involve speed of tapping to 
predict possible future speed in typing, or the answering 
of questions based upon information which must be 
acquired before a vocational job can be undertaken. 

A test of social intelligence may involve answering 
questions about social situations which necessitate the 
use of social judgment; or answering questions reflecting 
one’s observation of human behavior and human motiva- 
tions. 

A test of mechanical intelligence may involve putting 
together a dozen pieces to make up some mechanical 
device, or the identification of commonly used mechanical 
tools. 

These will serve as examples of what is meant by a 
“test” as the term is used in psychological measuring. 

Tests used by psychologists may be of two types: 
objective and subjective. They are objective when the 



16 


Instruments of Measurement 


performance can be graded, rated, or scored the same by 
everybody who attempts to grade them. Such tests can 
be scored by a definite scale which does not necessitate 
judgment or decision on the part of the person doing the 
rating at the time the performance is graded. In other 
words, an objective test is one in which a given per- 
formance will produce the same measurement no matter 
who rates it, 

A subjective test, on the other hand, is one in which 
the judgment or decision of the rater has a great deal to 
do with the grade or rating which a given performance 
receives. To illustrate the differences between these, 
suppose we wish to disclose a person’s knowledge of Eng- 
lish literature. By an objective method, we might give 
him a list of 50 definitely true or false statements about 
books, writers, and movements in the field of English 
literature, and perhaps in addition, a list of 20 books or 
poems to be matched with an equal number of authors. 
Once constructed, such a test could be graded with ex- 
actly the same results by everybody who graded it. On 
the other hand, by a subjective method, we might ask 
the individual in question to discuss first the importance 
of Carlyle and then, perhaps, the romantic movement 
in English literature. It is quite obvious that several 
people attempting to grade an individual’s performance 
on such a test might disagree in their evaluation of it. 

II. Rating Scales 

The rating scale is another means of indicating quanti- 
tatively the degree to which individuals possess abilities 
or traits. In this method, however, instead of having 
the individual perform some task or problem which is 
to indicate the amount of the trait, the measurement 
represents the subjective impressions of someone who 



Instruments of Measurement 


17 


judges the amount of the trait or ability possessed by 
the individual from previous association with him in 
situations where the trait or ability is supposed to be 
shown. In other words, ratings represent judgments 
made about someone. They may be made about oneself 
or about another, more frequently the latter. The value 
of ratings depends fundamentally upon the accuracy of 
the judges who make them, upon the number of judges, 
and also upon the extent to which the abilities rated lend 
themselves to observation and evaluation. Several types 
of rating scales have been developed. These are dis- 
cussed in Chapter XIX. 

III. Questionnaires 

A questionnaire represents in general a systematic re- 
port of an individual’s experiences, attitudes, interests, 
or beliefs, given by his answers to certain questions. 
It differs from a psychological test primarily in that it 
usually asks for a statement about an individual’s per- 
sonal characteristics or his personal beliefs, rather than 
for the performance of a definite task or problem. It 
differs from a rating scale in that it usually does not 
ask for estimates of the degree to which an individual 
possesses certain traits. It may in some instances, how- 
ever, overlap the rating scale. Both of these measuring 
devices may involve the stating of attitudes toward 
things or of degree of interest in certain things. 

Questionnaires have been used as a rule in the study 
of personality characteristics such as fears, worries, etc., 
of social attitudes and beliefs, and of vocational interests. 
Like the rating method they are also used primarily with 
traits which are not capable of definite measurement by 
a test method, and again as with the rating method, the 
value of the questionnaire can be improved by attention 



18 


Instruments of Measurement 


to the definiteness and the objectiveness of the questions. 
(See Chapters XI and XIX for examples of question- 
naires.) 

IV. Raw Measures and Relative Measures 

Raw measures. Raw measures represent those values 
which we obtain first and most directly from the unit 
employed. For example, if we use an intelligence test 
with 100 questions, the raw measure represents the num- 
ber answered correctly. Or, in a mechanical aptitude 
test in which an individual assembles 10 mechanical de- 
vices of a dozen pieces each, the raw measure may be 
the number of the mechanical pieces properly placed. 
On a graphic rating scale, a raw measure may represent 
the simple sum of the numerical credits allowed on each 
part of the rating scale. These raw measures represent, 
then, the simple added credits which have been assigned 
to each part of the test or the scale. 

If we had only one measuring device, one yardstick for 
each quality measured, and if each numerical unit were 
equal to every other numerical unit on the scale, then we 
should have no need for anything except the raw meas- 
ures, just as in measuring length we may designate every- 
thing in terms of number of feet. In the latter case we 
are always measuring with the same device, and each 
foot is equal to every other foot. 

Relative measures. In psychological measurement, 
however, we are not always using the same device. We 
may use the Army Alpha Test, by which we measure 
intelligence with a scale extending to 212 points; or the 
Binet Test, with a scale in terms of months of mental 
age; or the Thorndike Intelligence Test, with a scale 
extending over some 500 points. In many of our scales 
the problem is further complicated by the fact that an 



Instruments of Measurement 19 

increase of five points on the scale may not mean exactly 
the same thing at different places on the same scale. 
Therefore, in order to make our measurements inter- 
pretable without a very detailed knowledge of each avail- 
able measuring device, we resort to the use of relative 
measures. These are derived from the raw measures in 
such a way as to indicate the ratings of an individual in 
terms of his relationship to others of a similar group or 
in relation to others in the group of which he is a part. 
Most of these relative measures will be discussed in de- 
tail in their proper places; they will only be mentioned 
at this point. 

In measurement of the intelligence of children, prob- 
ably the most commonly used relative measure is the 
I. Q., or Intelligence Quotient, which translates the raw 
intelligence measurement into a rating in terms of the 
child’s relationship to other children of his chronological 
age. In adult measurement probably the most common 
relative measure used is that of percentile rating, which 
states the individual’s position in the group in which he 
is measured, assuming that the group is made up of 100 
persons. Other means of indicating relative measures 
are based upon placing the individual in his proper posi- 
tion in relation to a normal distribution curve of the 
trait being measured, the amount of the trait usually 
being stated in terms of how far the person diverges from 
the average or central tendency of the measurement. 

V. Reliability and Validity 

Every measuring device must possess the character- 
istics of reliability and validity. We shall have occasion 
to refer to these terms many times in the course of our 
discussions, for the meeting of these standards forms the 



20 


Instruments of Measurement 


“acid test” of our ability to measure human traits and 
abilities quantitatively, 

A test or measuring device possesses reliability if it 
gives the same result when applied at different times or 
when applied by different persons. To illustrate with a 
single physical measurement, a yardstick possesses relia- 
bility for measuring length in feet and inches because its 
use gives the same result for a given distance every time 
the distance is measured; furthermore, its use gives the 
same result whether you measure the distance or I meas- 
ure it. To apply the principle to a psychological meas- 
urement, a test for measuring intelligence possesses 
reliability if its use gives the same degree of intelligence 
for a given person upon different applications by different 
testers. The reliability of a psychological test is ordi- 
narily ascertained by giving the test twice to the same 
group and comparing the results of the two measure- 
ments. To the extent that the results of the two testings 
are the same the test can be assumed to be reliable. 
Reliability of psychological tests and measurements de- 
pends primarily upon certain characteristics of the meas- 
uring instrument, the most important of which are 
comprehensiveness and representativeness in the content 
of the test, and objectivity (freedom from dependence 
upon subjective opinion, judgment, and the like). 

A test or measuring device possesses validity when it 
measures what it purports to measure. Our yardstick is 
a valid measure of length because it actually measures 
the physical quality in question. A test designed to 
measure intelligence is a valid test of intelligence if it 
actually measures mental ability; it is not a valid test if 
it measures something else. A test for measuring voca- 
tional ability is a valid test of that ability if those rating 
high on the test actually possess high ability in the voca- 



Instruments of Measurement 


21 


tion and those rating low possess low ability in the 
vocation. Because of the indirect nature of many psy- 
chological tests and measurements, validity has not been 
an easy standard to attain, and many of the measuring 
devices studied have fallen short of satisfactory validity. 
The validity of a psychological test is ordinarily ascer- 
tained by comparing the test measurements for a group 
of persons with some other measurement or indication 
(criterion) of the trait or ability which the test purports 
to measure. If the test measurements agree with the 
criterion, then the test may be assumed to be valid; if 
they do not, then the test is invalid. For example, the 
validity of a vocational aptitude test may be ascertained 
by comparing test results with actual records of voca- 
tional success of those who have taken the test. 

Correlation, We shall so frequently refer to this term 
in studies of reliability and validity of psychological tests 
that it seems wise to define it at this point. Correlation, 
as the term is ordinarily used, refers to a statistical 
method of determination of the relationship between 
two things, measurements, or variables. The amount of 
relationship is indicated by a numerical result, a coeffi- 
cient of correlation, which may have a numerical range 
from 0 to 1.00. If two things being studied by the cor- 
relation method vary in the same direction — that is, if 
when either is high in amount the other tends to be high 
— the correlation is positive; if they vary in opposite 
directions — if when one is high in amount the other tends 
to be low — the correlation is negative. 

The coefiicient is zero if the two things being studied 
are not related at all. For example, should we study by 
the correlation method the relationship between height 
and intelligence in adults, we should expect the correla- 
tion coefl&cient to be zero, since these two variables are 



22 


Instniments of Measurement 


unrelated and are not found to vary together. The co- 
efficient is 1.00 if the two things being studied vary to- 
gether in perfect order. For example, if we should com- 
pare by correlation scores or ratings on two psychological 
tests given to a group and should find that the highest 
person in one was highest in the other also, that second 
in one was second in the other also, that third in one was 
third in the other, and so on, the coefficient of correla- 
tion would be 1.00. Relationships short of perfect are 
represented by coefficients of correlation between 0 and 
1.00. If the relationship be nearly perfect, the value 
may be .85 or .90; if only slight, .10 or .20. 

Correlation coefficients are a very convenient means of 
stating many of the relationships we shall have to dis- 
cuss in considering psychological measurements. They 
enable us to summarize by one numerical expression 
what we might otherwise be able to express only by pre- 
senting much more detailed data. Test reliability and 
validity, which we have just discussed, are commonly 
studied by the correlation method. The relationship be- 
tween the two applications of a test are “correlated” to 
indicate reliability. Psychological test scores are “cor- 
related” with the criterion measures to indicate validity. 
It is difficult to answer the important question that must 
arise in the reader’s mind at this point. What should 
be the numerical value of coefficients of correlation 
worked out to indicate reliability and validity? Un- 
doubtedly, they should be as high as possible to attain. 
Aside from this general statement, we should perhaps 
simply warn the user of psychological tests against put- 
ting too great a dependence on those which show rela- 
tively low reliability and validity coefficients. 



CHAPTER III 


The History of Measurement in Psychology 

P RACTICALLY every child in public or private 
school today has taken a “psychological” test or a 
mental test — has had some one or more of his mental 
traits measured. And the adult who has been placed 
vocationally in the last five years more likely than not 
has had some test of his mental capacity for performance 
of his job. This use of measurement seems to have 
spread in a short time. The term “mental tests” was 
first used in 1890 by Cattell to designate tasks which he 
hoped would measure intellectual ability. The first in- 
dividual intelligence test, that of Binet, appeared in 1905. 
The first group intelligence tests as we know them today 
were applied during the World War, less than twenty 
years ago. And within the twenty years after the first 
intelligence test was used, psychological testing devices 
had reached practically the stage of development that 
we see today. 

In any particular development in a broad scientific 
field we are likely to be doubly impressed — ^first, we are 
amazed by the rapidity with which progress takes place 
once the beginning has been made; and second, we 
wonder why development did not begin sooner. We are 
often misled in our judgments of rapidity of development 
because we cannot see all the development — ^we do not 
see the roots which go far back into the history of science. 
For example, we may see the beginning of intelligence 

23 



24 History of Measurement 

tests in Binet’s test of 1905, yet the roots of development 
of his test go much farther back. If we wonder why in- 
telligence and other psychological tests did not occur 
sooner in the history of psychology, let us consider the 
prerequisites to the developmeiit of most applications of 
science. Except for some few instances of purely acci- 
dental discovery, applied fields of a science are devel- 
oped only when (1) attitudes both of scientists and of 
the public are conducive to the development, (2) needs 
for the application exist, and (3) techniques in the 
science have been refined sufficiently to make the new 
application possible from a technical standpoint. In- 
telligence tests had to await thorough acceptance of a 
belief in individual differences, for without a belief in 
individual mental differences there was little interest in 
measurement. Changing attitudes toward the mentally 
defective and the “peculiar” individual, also, encouraged 
work on mental tests. The need for these tests had ex- 
isted for some time in the social problem of dealing with 
defectives, criminals, and the like. On the eve of the 
actual development of the first general intelligence test, 
a need was emphasized to psychologists and educators by 
difficulties in the guidance of school children who did not 
progress normally through the grades. Later, probably, 
the best example of the influence of need is seen in the 
hastened development of group intelligence tests due to 
classification problems in the Army during the World 
War. A survey of the techniques necessary for intelli- 
gence test development takes us into the psychological 
laboratory, to those working on the problems of individ- 
ual differences, and to the mathematicians from whom 
psychologists borrowed the many statistical procedures 
which have been indispensable to the development of 
testing. 



History of IMeasurement 


25 


The history of measurement falls into seven periods. 
These have been designated solely for convenience of 
discussion, and are not to be thought of as clear-cut in 
any sense. They merge one with the other, and there 
is a great amount of overlapping among the types of 
work represented. They are; (1) Background Period, 
or Pre-Binet Period; (2) Binet Period — The Individual 
Intelligence Test; (3) Preliminary Period of Group 
Mental Tests; (4) World War Period — The Army Tests; 
(5) Post-War Period — ^Application of the Army Tests 
and Development of Similar Tests; (6) Maturing and 
Stabilizing Period of 1921-1926; (7) Present Period, 

If we seem to confine our discussions to intelligence 
tests, it is because other types of psychological measure- 
ment — aptitude tests, personality tests, etc. — ^have been 
largely an outgrowth of intelligence tests, taking place 
in the last two periods. 

I. The Background of Intelligence Test Development 
— The Pre-Binet Period 

During this period much was contributed indirectly, if 
not directly, by experimental psychology. Laboratory 
activity brought to psychology an objective aspect which 
the introspectional methods had not produced. Quan- 
titative psychology came into being — measurement be- 
came a part of the investigation of psychological entities. 
At first many of the quantitative aspects were borrowed 
from physics, so that investigators were in reality as 
much physicists as psychologists or were individuals who 
had deserted their original chosen field of physics. Their 
experiments included such things as physiological studies 
of the various senses and measurements of thresholds of 
stimuli for the various senses, sensation levels, speed of 



26 


History of Measurement 


neural impulses, and reaction time. Although today 
we may not utilize many of the exact measurements with 
which early experimental psychologists busied them- 
selves, the principles and attitudes established by them 
have been of immense value in the development of 
measurements of human traits. 

It does not seem out of place to mention a few of the 
greatest names in this early development of quantitative 
psychology. The honor of founding quantitative psy- 
chology is most frequently given to Gustav Theodor 
Fechner (1801-1887). His book Elemente der Psycho- 
physic, published in 1860, was the starting point for 
quantitative psychology. In this book Fechner brought 
together scattered observations from astronomy, physics, 
and biology, related these to his own elaborate observa- 
tions in physics, mathematics, and physiology, and placed 
them all at the service of psychological measurement. 
Fechner’s important contributions to psychology can be 
named under three headings: a clear expression of 
Weber’s Law (a law regarding the relation between sensa- 
tion and stimulus) ; an elaboration of the concept of the 
threshold as regards stimuli; and the working out of 
three independent psychophysical methods for measure- 
ment of thresholds. 

Helmholtz may be mentioned as a second great figure 
in early quantitative psychology. In many ways he did 
more to inspire the advance of quantitative investigation 
than did Fechner. Fechner was much of a philosopher, 
with not a little of the mystic in his disposition, and his 
investigations are too often bound up with theories re- 
flecting his mysticism. Helmholtz, on the other hand, 
was wholeheartedly a scientist and an empiricist. His 
greatest contributions lie in quantitative investigations in 
sensory fields of vision and hearing. His investigations 



History of Measurement 


27 


also included measurements of speed of neural impulse 
and reaction-time experiments. 

The last of three great figures we shall mention in this 
period was Wilhelm Wundt (1832-1920). Wundt seems 
to have done more than any other scientist in this period 
to inspire students to work in the field of quantitative 
psychology. He established the first psychological labo- 
ratory at Leipzig in 1878. Here began the extensive 
sensori-motor measurements which were to be so im- 
portant in the early attempts to measure intelligence. 
Wundt’s influence is seen in the fact that almost all of the 
early psychological laboratories in America as well as 
elsewhere were established by pupils of Wundt, and 
patterned after his laboratory. 

G. Stanley Hall founded in 1883 at the Johns Hopkins 
University what is usually considered the first American 
laboratory for psychology. Hall later went to Clark 
University and was instrumental in the development of 
experimental psychology there. In 1888 Cattell, a pupil 
of Wundt, established a psychological laboratory at the 
University of Pennsylvania; and three years later he left 
to go to Columbia University, where he established an- 
other. E. W. Scripture in 1892 founded a similar labora- 
tory at Yale. This venture, however, did not fare so well 
as did those of Hall and Cattell, and did not attract so 
many distinguished students. 

During the period covering the fifteen or twenty years 
before the publication of the first Binet Scale in 1905, 
there appeared an extensive literature on the study and 
measurement of individual differences and the use of vari- 
ous sensory and motor tests. These represent the begin- 
ning of “mental testing” as such ; many of them represent 
definite attempts at measuring the general intellectual 
capacity of individuals. 



28 History of Measurement 

In summarizing the character of this period or group 
of studies we note the following: First, there was an 
emphasis on individual differences and their study. The 
interest in individual differences grew out of work in the 
psychological laboratory and out of studies in related 
fields of genetics, anthropology, and education. Second, 
the attempts to measure individual differences were in 
terms of elemental qualities or simple sensory or mental 
processes. For these seemed to the early investigators 
to be the units of mentality or personality as a whole; 
they were more objectively and reliably measureable; 
they seemed to offer a better basis for scientific study; 
and instruments and materials for measuring many of 
them were directly to be borrowed from the psychological 
laboratory. The measurement of complex processes and 
“general” intelligence did not come until the measure- 
ments of the simpler processes had proved inadequate as 
measures of general ability. Third, a large proportion 
of the tests used in these early attempts to study mental 
ability quantitatively were of the sensory or motor type, 
as tests of sensory acuity, ability to discriminate weights, 
ability to react quickly to a motor stimulus, tests of 
strength, etc. These reflected the influence of the early 
psychological laboratories in which methods of this kind 
had been worked out. It was expected or hoped that 
such simple sensori-motor measurements would be indic- 
ative of the general ability of the individual. Fourth, 
this period marked the introduction of statistical proce- 
dures for the study of the value of mental measurements. 

II. The Binet Period 

The Binet Period is that which culminated in the de- 
velopment of the first real intelligence test. Alfred Binet 
stands as the most important figure in the formulation of 



!(listory of Measurement ) 

aiiis wttcniis loday known as the Binet Test or the 
-ffinet-Sinw® 'Best, after Binet and his coworker. Binet 
stud^ :wdicine, and through contact with Charcot 
m^^RMUiiaiir^developed an interest in abnormal psychol- 
ogjs^jpjiff^t most of his early work dealt with medical 
or physiological subjects. He seems to have given up 
abnormal psychology by 1887 and thereafter to have de- 
voted himself to problems more closely allied to educa- 
tional psychology. He began experimentation in the 
schools of Paris and its suburbs. 

When Binet began this work, unsatisfactory results 
were being obtained with the “intelligence” or “mental” 
testing already going on. But failures did not stop the 
work. Numerous special studies were made of various 
aspects of the problems which developed. Binet was a 
most energetic worker in these studies, and there is little 
doubt that his findings were important in the final evo- 
lution of his intelligence scale, which was first published 
in 1905. In fact, many of the separate tests of his later 
scales appear in his earlier studies. 

A bibliography of Binet’s work shows that his early 
investigations included a study of memory and imagina- 
tion, with suggested tests for measuring them. In these 
studies Binet first used the memory for syllables and 
sentences as a test, a method of measuring later consti- 
tuting part of the Binet Intelligence Scale. He also 
reported results in the measuring of suggestibility; in- 
vestigations of the effect of various physical processes, as 
eating, on mental work; and study and measurement of 
attention with a consideration of its relation to intelli- 
gence. Probably the two most important writings of 
Binet previous to the publication of his intelligence scale 
are an article appearing in 1898 in the Revue Philosoph- 
ique and a volume appearing in 1902 entitled The 



80 History of Measurement 

Experimental Study of Intelligence. The former dis- 
cusses measurement in individual psychology and seta 
forth many of Binet’s ideas as to how intelligence would 
have to be measured. Binet recognized that indications 
of intelligence, to be valuable, must represent measure- 
ments, not merely descriptions. He also pointed out in 
this early article the inadequacy of the simple tests of 
sensory and motor processes for measuring the sum total 
of intelligence; he did not at that time suggest the solu- 
tion of the problem, but he set down some of the main 
theses on which the work must be developed. He em- 
phasized the importance of standardized conditions where 
the testing of mental processes is carried out, and he 
suggested the use of a gradation of tasks as to difficulty 
in the measurement of mental qualities. Several tests 
mentioned in this article we find later in the construction 
of the Intelligence Scale of Binet and Simon. The Ex- 
perimental Study of Intelligence reports an extensive 
qualitative analysis of the mental abilities of his two 
daughters, with less extensive observations of responses 
of a number of other subjects. Many tests later used by 
Binet appear in this report. From a theoretical stand- 
point the report is of great importance in its emphasis 
upon the testing of higher processes — ^processes stimu- 
lated by language and other social stimuli — as against 
the testing of mere sensory stimuli and the recording of 
responses of the simpler type, which Binet held inade- 
quate for indicating the individual’s intelligence. 

In the latter part of 1904 the Minister of Public In- 
struction appointed a commission, of which Binet was a 
member, to investigate the problem of backward children 
in the Paris schools. In the course of the investigation 
an immediate need for an intelligence scale presented 



81 


History of Measurement 

itself, and it was to meet the emergency that the first 
intelligence scale was constructed. 

By this time Binet had rather clearly conceived the 
ideas which made his scale so markedly superior to previ- 
ous attempts to measure intelligence. These ideas may 
be summarized as follows: (1) Measures of simple sen- 
sory and motor processes are not adequate for measuring 
intelligence; higher, more complex processes must be 
measured. (2) A wide variety of tasks or tests must be 
administered to obtain an adequate measurement of in- 
telligence. (3) Conditions under which the tests are 
given must be standardized and kept constant. (4) The 
various parts of the test should show gradations in diffi- 
culty. (5) Age standards for expressing intelligence are 
feasible and practical (this idea, however, did not appear 
in the first scale but was utilized in the revision three 
years later). 

The first of the Binet intelligence scales appeared in 
1905. It consisted of 30 tasks of varying degrees of diffi- 
culty, arranged in order from easiest to most difficult, the 
difficulty being determined from trials on normal and 
subnormal school children. The tasks were all relatively 
simple and easily administered, and the results relatively 
easily scored. Examples of tasks 1, 16, and 28 are given 
below : 

1. Visual coordination. Noting the degree of coordi- 
nation of movement of the head and eyes as a 
lighted match is passed slowly before the subject’s 
eyes. 

16. Giving differences between various pairs of familiar 
objects recalled in memory: (a) paper and card- 
board, (b) a fly and butterfly, and (c) wood and 



82 


History of Measurement 

28. Giving the time that it would be if the large and 
the small hands of the clock were interchanged at 
four minutes to three and at twenty minutes after 
six. A much more difficult test is given those who 
succeed in the inversion; namely, to explain the 
impossibility of the precise transposition indicated. 

The subsequent revisions of the scale differ chiefly in 
that the number of tests was increased and the tests 
were arranged into groups according to age, so that intel- 
ligence could be expressed as a Mental Age dependent 
on the group of tests reached in the scale (reaching the 
tests of the. 8-year level, for example, signified a Mental 
Age of 8.) A more detailed discussion of the final Binet 
scale appears in Chapter IV. 

III. Preliminary Period of Group Mental Tests 

After the reliability and validity of the Binet method 
of intelligence testing were established, the need for some 
group method of testing was soon felt. If a group 
method could be developed it would conserve time in 
testing and greatly extend the usefulness of testing. The 
need for a sorting device which would quickly classify 
according to ability a million or more men, recruited dur- 
ing the World War, greatly hastened the final evolution 
of a practical group test of intelligence. 

Prior to the war sporadic instances of group testing oc- 
curred. Many of these were purely adaptations to group 
administration of some of the tests of the Binet series 
or some of its modifications. Reuel H. Sylvester de- 
scribes an early instance of such testing: 

The writer recalls a situation in 1916, in which the 
urge toward group testing was felt most keenly. 

This was before the hastened development of group 
methods under pressure of necessity for marshalling 



History of Measurement 


33 


the Nation^s man-power in the organization of our 
citizenry into armies for service in the World War. 

The Medical School of the State University of Iowa 
had been invited to make a comprehensive study of 
the pupils in the public schools of Wapello, Iowa. 

Each pupil was given medical examinations by spe- 
cialists in a dozen fields of medicine, dentistry, and 
school hygiene. Experts on heredity and environ- 
mental factors studied the family and community 
aspects of each child’s life. The writer was assigned 
the task of measuring all children as to intelligence 
and of diagnosing the mental deficiencies of those 
presenting problems. At that time our only avail- 
able measuring devices were the Stanford-Binet 
Scales. To apply any of these to some 500 children 
was a task entirely beyond our resources of personnel 
and time. 

We met the situation by adapting several of the 
Yerkes Point scale tests to group application. After 
these had been given and scored, clinical psycholo- 
gists took the children one at a time, cleared up 
doubtful responses that had been written in the group 
testing, and applied the remaining tests of the Point 
Scale. Thus fairly accurate scores were secured. 

Results from the Wajicllo survey were so satisfac- 
tory that the method of group testing was applied in 
several other school systems in Iowa. At Council 
Bluffs especially valuable results were obtained. 
There an adaptation of the Stanford-Binet Scale was 
used. Superintendent Theodore Saam followed up 
and refined the method and used it extensively later. 
These early efforts are reported as representative of 
such attempts at group mental testing of that early 
period. With the exception of parts of Otis’ tests no 
actual products of that Period remain in use today, 
but those beginnings revealed the possibilities of the 
method and prepared the way for the miraculously 
rapid formulation of the Army Tests in 1917.^ 

1 Sylvester, Reuel H., “Group Mental Tests” in Clinical Psychology, 
edited by Robert A. Brotemarkle, University of Pennsylvania Press, 
Philadelphia, 1931. 



34 


History of Measurement 


. During this period Whipple published an important 
book on tests, many of which were suitable for group 
testing.* A paragraph from his introduction emphasizes 
the growing interest in tests during this period: 

One need not be a close observer to perceive how 
markedly the interest in mental tests has developed 
during the past few years, hlot very long ago atten- 
tion to tests was largely restricted to a few laboratory 
psychologists; now tests have become objects of at- 
tention for many workers whose primary interest is 
in education, social service, medicine, industrial man- 
agement and many other fields in which applied psy- 
chology promises valuable returns. 

IV. The World War Period — ^The Army Tests 

During this period psychological-test development and 
application received the greatest impetus that they have 
received in their whole history. During the brief period 
of about eighteen months a psychological examining divi- 
sion was established in the Army, the many forms of psy- 
chological tests developed for Army use were constructed, 
trials and standardization of the test material were made 
on limited groups, the final tests were administered to 
almost two million men, and group testing of mental 
abilities was demonstrated to be as feasible as and in 
many ways more practical than the individual testing 
methods which had originated with the work of Binet. 

The Army testing work had its origin in a meeting of a 
group of experimental psychologists at Harvard Univer- 
sity. Upon the declaration of war by the United States 
on April 6, a session was arranged for discussion of the 
relations of psychology to national defense and for con- 

2 Whipple, G. M., Manuud of Mental and Physical Tests. Simpler 
Processes, second ed., Warwick and York, Inc., Baltimore, 1914. 



History of Measurement 


35 


sideration of ways that psychology could be applied in 
the solution of military problems. National scope was 
given to the psychologists’ interest when the American 
Psychological Association joined the more local group. 

Efforts of the psychologists who were working on the 
problem crystallized toward a plan for applying “psycho- 
logical” or mental tests to the recruits in the Army as an 
aid in classifying them for training or occupational duties, 
and in arriving at estimates of their probable value to 
the service. Within two weeks the president of the Amer- 
ican Psychological Association, then Dr. Robert M. 
Yerkes, presented the plans of the Association for psy- 
chological service to the National Research Council of 
the National Academy of Sciences. The favorable recep- 
tion of the plan by the National Research Council gave 
it more oflScial sanction and undoubtedly did much to 
insure its final adoption by the War Department. A 
tentative plan for the psychological examination of re- 
cruits was submitted to the Surgeon General of the Army 
on May 1. It met with official favor, but some weeks 
passed before official machinery was put into action. Dur- 
ing the following three months a committee headed by 
Dr. Yerkes developed a great deal of the test material 
later used, and carried out unofficial trials of their mate- 
rial in certain Army and Navy stations. Early in August 
Dr. Yerkes was appointed to the Army with a rank of 
major to organize and direct psychological examinations 
for the Medical Department of the Army. The work 
was now officially begun and, after sufficient trial, the 
psychological tests were extended to include all Army 
recruits. 

With this chronological review of the Army work let 
us examine briefly the task which faced those engaged in 
it. Their general problem was that of preparing an ade- 



36 History of Measurement 

quate test for measuring the mental level of large groups 
of men at the same time. They could not utilize the 
Binet methods, since these necessitated individual ex- 
amination of each recruit — a method obviously too time- 
consuming for use in testing upwards of a million and a 
half men. In addition to being applicable to large num- 
bers of men in a short time, the test set up had to meet 
certain other requirements — ^it must not depend upon 
specific school information, since many of the examinees 
had had little formal schooling; it should require a mini- 
mum of writing by the subject; it should be capable of 
measuring over a wide range of abilities, so that all men 
could be measured, from the mentally defective to those 
of highest ability; it should be easily and objectively 
scored and rated; and a number of equivalent forms of 
the test should be constructed to prevent coaching. 

The preliminary experimentation and trials of material 
led to the development of two new types of tests — one, 
the Army Alpha Test, for persons who could read and 
write the English language; and another, the Army Beta 
Test, for persons who could not read or write. A form 
of each of these is reproduced at the end of this chapter. 

In addition to the two new group tests developed, the 
psychological division made considerable use of individual 
tests where doubtful results were obtained by group 
methods. For the individual tests the Binet Test or 
a modification of it was used with English-speaking 
or literate individuals, and a scale of manual perform- 
ance tests * for illiterates or foreigners. 

Intelligence ratings in the Army were usually translated 
into letter grades, A signifying high scores on the test, B 
fairly high, C-f- somewhat above average, C average, C— 


* See Chapter IV for an outline of this test. 



History of Measurement 


37 


below average, D low, and D — and E very low. Table 
I shows the Army test score equivalents for these letter 
ratings and the percentage of white soldiers receiving each 
letter grade. 


Table I 

ARMY TEST RATINGS ^ 


Per Cent of 

White Soldiers Range of Scores Corresponding to Letter Ratings 


Letter 

Receiving 

Alpha 

Beta 

Performance 

Binet 

Grade 

Grade 

Test 

Test 

Test 

Mental Age 

A 

4 

135-212 

100-118 

260-^11 

18.0-19.5 

B 

8 

105-134 

90-99 

240-259 

16.5-17.9 

C+ 

15 

75-104 

80-89 

215-239 

15.0-16.4 

c 

25 

45-74 

65-79 

190-214 

13.0-14.9 

c- 

24 

25-44 

45-64 

150-189 

11.0-12.9 

D 

17 

15-24 

20-44 

90-149 

9.5-10.9 

I>~&E 

7 

0-14 

0-19 

0-89 

0.0- 9.4 


As an immediate result of the Army tests, about 8,000 
men were recommended for discharge from the Army for 
mental deficiency, about 10,000 were assigned to labor 
battalions or other services requiring only low-grade 
intelligence, and about 10,000 were recommended for spe- 
cial battalions for special training or for further observa- 
tion because of their low ability. For higher grade 
groups much use was made of the test results in assign- 
ments to training groups, in vocational placements, and 
in the development of officer material. 

The more far-reaching results of Army testing reach 
into all the subsequent developments of group testing 
methods for measuring human traits and abilities, ex- 
amples of which are to be considered throughout this 
text. 


* National Academy of Sciences, Memoirs, Vol. XV, p. 195. 



88 


History of Measurement 


V. The Post-War Period — ^Application of the Army 
Tests and Development of Similar Tests 

This period, which covered roughly the two years fol- 
lowing the World War, was characterized by the spas- 
modic application of the Army tests to every conceivable 
kind of group. As is likely to be true where there is 
over-enthusiasm for a new device, much chaos marked 
these early trials. Often too much confidence was placed 
in quickly carried out testing programs; many persons 
attributed values to the tests that they were never de- 
signed to have; and too often the findings of scientific 
investigators were misinterpreted and given fantastic 
meanings by lay interpreters and popular publicity. But 
many of these early studies following the development 
of the Army tests led to the great expansion of psycho- 
logical testing which has taken place in the last fifteen to 
twenty years. 

In the development of new tests, the Army testing 
work bore fruit first in the construction of group tests 
for use in schools and colleges. One of the earliest of 
these was a group intelligence test published by Otis. 
Otis had, in fact, been working on testing material be- 
fore the war and had contributed many of his ideas and 
questions to the Army tests. His first academic scale 
was published in 1918 shortly before the close of the war.® 
Some forms of the Otis Intelligence Tests are still widely 
used in measuring intelligence in high school. Another 
group test for measuring intelligence which appeared 
very early was a scale by S. L. and L. W. Pressey. Their 
test was designed for high schools, and though it is not 
widely in use today, it was used in many of the early 
school surveys. 

*See Chapter V for a discussion of one of Otis’s tests. 



History of Measurement 89 

In 1919 appeared the Haggerty Delta 1 and Delta 2 
Tests. The former, largely non-linguistic and similar 
to the Army Beta, was designed for children of the pri- 
mary grades; the latter, more like the Army Alpha, was 
for children of upper elementary grades. Both scales 
were worked out in connection with mental surveys which 
the author conducted in Virginia schools. 

A year later the National Intelligence Tests appeared. 
These tests were the outcome of an extensive study- of 
test material by a committee consisting of Haggerty, Ter- 
man, Thorndike, Whipple, and Yerkes, the study having 
been financed by a grant from the General Education 
Board. Most of the men who worked out these tests 
had been in the psychological service during the war, 
and their recent experience was brought to bear in the 
development of these group tests for academic use. Two 
scales, A and B, with two equivalent forms each, were 
constructed, and by giving the tests to large numbers 
of children, norms for each age and grade were estab- 
lished. The tests are suitable for the elementary grades. 
They have been used probably more widely than any 
other single test for the elementary school. 

These examples represent important developments dur- 
ing the period immediately following the Army testing. 
There were many other studies not mentioned in this 
brief sketch, and work on some of the important tests 
appearing later, as the Thorndike Intelligence Examina- 
tion for College Students, was begun during this period. 

VI. The Maturing and Stabilizing Period of 1921-1926 

Intelligence testing now advanced to practically its 
present stage and settled down to its real effectiveness. 
The period is marked primarily by the following features: 
(1) New tests in the field of group mental testing con- 



40 History of Measurement 

tinued to be developed. Particularly was there an ex- 
tension of the group testing methods to the construction 
of new tests at the upper and lower ends of the testing 
scale. College tests and kindergarten or preschool tests, 
slower to appear than the general adult test and the in- 
termediate school test, began to appear. (2) Much fun- 
damental work was done on the validation of the new 
group tests of mental ability. These studies firmly es- 
tablished the group method of measuring ability, confirm- 
ing for the newer tests in the schools what had been 
found true in the Army testing. (3) Those measuring 
instruments which stood the tests of validity and reli- 
ability were standardized so that ratings on them could 
be properly interpreted in terms of relative abilities. (4) 
Many studies of the uses and applications of intelligence 
tests were made in the schools and, to a more limited 
extent, in other organizations interested in measuring 
human traits. Such studies established the use of mental 
tests on a scientific basis and pointed the way to aban- 
donment of some of the earlier sweeping claims made 
for mental tests. (5) The methods and techniques of 
the newly developed intelligence tests began to be ap- 
plied in developing other measuring instruments. The 
“short-answer” or “psychological” type of question was 
early adapted to measuring achievement, marking the 
beginning of the objective “short-answer” type of class- 
room examination in the various school subjects. The 
methods of the general mental test were also ^rly 
adapted to the working out of more specialized tests, such 
as the special aptitude tests for vocations and occupa- 
tions, or the special ability test exemplified by the tests 
for measuring mechanical aptitude. Finally, there was 
a growing interest in the measurement of personality and 
character traits. 



History of Measurement 


41 


VII. The Present Period 

The present period of psychological testing, at least 
for intelligence testing, is one of matured application in 
which group mental testing has recently changed but 
little. In academic uses, mental tests have become 
rather firmly established, their true values are generally 
understood, and their limitations have been fairly well 
determined. In other fields, outside the schools, mental 
testing is not yet on quite so firm a footing; but the last 
ten years have witnessed a marked increase in the utili- 
zation of the methods of psychological testing in such 
fields as those of employment and personnel work. 
Newest present-day developments in psychological meas- 
urement are those in personality and character testing; 
in fact, this chapter in the history of our subject is yet 
far from completed. 

We shall not attempt here to sketch in any detail these 
last periods of the history of psychological testing. The 
rest of this book will tell their story. 



ARMY ALPHA TEST 


n»Ti 

1. ooooo 

2 . 



5* o o o li 

e. OOOOO 


7 . ABGDBFGHIJKLMNOP 

O O O military gun camp 

9. 34-79-56-87-(8-2S-82-47-27>31-64-93-71-41*52^ 

10- I I I I I I 

!!• I^/CkQAC^ITI/^QITI 

12 . 123456789 


® Reproduced by permission of the National Academy of Sciences. 


42 





Army Alpha Test {Continued) 


48 


TEST 2 

G*t th« aiuwm to these examples as qalektr as yon eaa. 

Use the side of this pace to fiffure on if you need to. 

i l How many are 6 men and 10 men? Answer ( 15 

2 If you walk 4 miiea an hour for S hours, how far do 

you walk? Answer ( 12 

] How many are 20 boats and 9 boats? Answer ( 

2 If you save 54 a month for 9^ months, hew much will yon 

save? Answer ( 

3 If 04 men are divided into squads of 8, how many squads will 
there be? Answer ( 

4 Mike had 11 cigars He bought 3 more and then smoked 8. Hofr 

many cigars did he have left? Answer ( 

S A company advanced 6 miles and retreated 2 miles. How far was 
It then from its first position? Answer ( 

5 How many hours will it Ikke a truck to go 48 milef at the rate of 3 
miles an hour? Answer ( 

7 How many cigars can you hay for |1 00 at the rate of 2 for 8 

cents? Answer ( 

8 A regiment marched 40 miles in five days The first day they 
marched 9 miles, the second day 6 miles, the third 10 miles, 
the fourth 7 miles. How many miles did they march the last 

day? Answer ( 

9 If you buy 2 packages of tobacco at 7 cents each and a pipe for 
75 cents, how much change should you get from a two-^oIlar 

bilP Answer ( 

10 If It takes 5 men 4 days to dig a 2U^ foot drain, how many men 
are needed to dig it in half a day? Answer ( 

11 A dealer bought some mules for $1,200 He sold them for $1,500, 
making $50 on each mule. How many mules were there? . . .Answer ( 

12 A rectangular bin tolds 500 cubic feet of lime. If the bin is 10 

feet long and 6 feet deep, how wide is it? Answer ( 

IS A recruit spent one-eighth of his spare change for post cards and 
twice as much for a box of letter paper, and then had $2 00 left 

How much money did he have at first? Answer ( 

14 If tons of bark cost $33, what will 3^ toD> coM? . . . .Answer ( 

15 A ship has provisions to last her crew of 400 men 5 months. How 

long would it last 1,000 men? .. . Answer ( 

16 If an aeroplane goes 300 yards in 10 seconds, how many feet does 

it go in a fifth of a second? Answer ( 

17 A U-boat goes 6 miles an hour under water and 20 miles an hour 
on the surface. How long will it take to cross a 100-mile channel, 

if It has to go three fifths of the way under water? Answer ( 

18 If 214 squads of men are to dig 5,992 yards of trench, how many 

yards must be dug by each squad? Answer ( 

19 A certain division contains 6,000 artillery, 15,000 infantry, and 
1,000 cavalry If each branch is expanded proportionately 
until there are in all 24,200 men, how many will be added to the 
artillery? Answer ( 

20 A commiuion house which had already supplied 1,897 barrels of 
apples to a cantonment delivered the remainder of its stock to 38 
mess halls Of this remainder each mess hall received 45 barrels. 

What was the toul number of barrels supplied? Answer ( 





44 


Anny Alpha Test {Continued) 


TEST 3 

This is a test of common sense Below are sixteen questions. Throe answers are given to each 
question. You are to took at the answers carefulty, then make a crou in the square before the best 
answer to each question, as in the sample: 

' Why do we use stoves? Because 
sample] well 

[ly they keep us warm 
□ they are black 

Here the second answer is the best one and is marked with a cross. Begin with No 1 and keep on 
until time is called. 



1 Cotton fibre is much used for making cloth 
because 

□ it grows all over the South 

□ it can be spun and woven 

□ it is a vegetable product 

2 Thermometers are useful, because 

□ they regulate the temperature 

□ they tell us how warm it is 

□ they contain mercury 

S Why are doctors useful? Because they 

□ understand human nature 

n always have pleasant dispositions 
0 know more about diseases than others 

4 Why ought a grocer to own an automobile? 
Because 

□ it is useful in his business 

□ it uses rubber tires 

□ It saves railroad fare 

5 A machine gun is more deadly than a rifle, 
because it 

□ was invented more recently 

□ fires more rapidly 

□ can be used with less training 

6 Why Is the telephone more useful than the 
telegraph? Because 

□ it gets a quicker answer 

□ it uses more miles of wire 

□ it is a more recent in\ ention 

7 Why is wool better than cotton for making 
sweaters? Because 

□ wool IS cheaper 

□ It is warmer 

□ It wears longer 

8 Why is New York larger than Boston? Be< 
cause 

□ it has more railroads 

□ it has more millionaires 

□ it is better located 

Sr Go to No. 9 above 


9 Evcr> soldier siiould be inoculated against 
typhoid fe\er, because 

□ many men ha\o typhoid fever 

□ the doctors insist on it 

□ It prevents epidemics 

10 Theatres arc useful institutions because 

□ they employ actors 

□ they afford a method of relaxation 

□ they give the rich a chance to spend their 
money 

11 A train la harder to atop than an automobile 
bemuse 

□ It in longer 
n it la heavier 

□ the brakes are not so good 

12 Why Is winter colder than summer^ Because 

□ the sun shines obliquely upon us in winter 
n January is a cold month 

□ there is much snow In winter 

13 Many schools are closed In summer, so that 
U the tcachei-3 may lia\e a vacation 

□ the children shall not be indoors in hot 
weather 

□ the school houses may be repaired 

14 If a drunken man is quarrelsome and insists 
on fighting you, it is usually better to 

□ knock him down 
n call the police 

□ leave him alone 

15 Why are electrical engineers highly paid'^ 
Because 

□ their ability is much in dmunid 
n they have a college education 

□ they work long hours 

16 Aeroplanes failed for many years because 

□ they were too heavy 

□ the materials cost too much 

□ the motor was not perfected 




Army Alpha Test {Continued) 


45 


TEST 4 

If tha two words of a pair mean the same or nearly the same, draw 
line under same. If they mean the opposite or nearly the opposite, draw 
line under opposite^ If you cannot be sura, puese. The two samples a 
already marked as they should be. 


SAMPLES 


{ pood— -bad 
< little— small 


.same — O£goei|to 
. same- — opposite 


1 hiph — low same— opposite 

2 alow — fast same— opposite 

8 larpe— preat same— opposite 

4 danper — safety same — opposite 

5 penulne — real same— opposite 


6 choose — select 

7 fault — virtue 

8 similar— different ... . 

9 jealousy— envy 

10 sacred — profane 


same — opposite 
same — opposite 
same— opposite 
same— opposite 
same— opposite 1< 


11 conquer — subdue 

12 vanity — conceit . 
18 allure — attract . 
14. waste— conserve 
16 deride— ridicule 


same— opposite 1 
same — opposite 1‘ 
same — opposite IS 
same— opposite 14 
same— opposite 16 


16 censure — praise .... 

17 illustiioue— exalted 

18 asltate — excite 

19 hagrsard — ^ffaunt 

20 con — pro 


same— opposite IS 
same— opposite 17 
same— opposite 18 
Same— opposite 19 
.same — opposite 20 


21 eminent — distinguished .... same — opposite 21 

22 conspicuous— prominent same — opposite 22 

28 depressed — elated same — opposite 28 

24 orifice — aperture same— opposite 24 

26 erudite— scholarly same — opposite 26 


26 recline — stand 

27 deKenerate— deteriorate 

28 martial— civil 

29 nonchalance — anxiety . 
SO torpor — stupor ... 


81 comprehensive — restricted .... same — opposite 81 

82 latent — hidden ... . same — opposite 82 

88 node— knot same — opposite 88 

84 celestial — terrestrial same— opposite 84 

86 camlvoroue— herbivorous . . . same— opposite 86 

86 urbanity— civility same— opposite 86 

87 proclivity — ^inclination same — opposite 87 

.88 putrid — fetid same — oppoaito 88 

89 impecunious— opulent same— opposite 89 

40 choleric — phlesmaiic same -opposite 40 


same— opposite 26 
same— opposite 27 
same— opposite 28 
same — opposite 29 
same — opposite 80 



46 


Army Alpha Test {Continued) 


TEST 5 

Th* wards A EATS COW GRASS tai that ordar ara mind up and 
don’t make a acntoncc, but they would make a aentance if put in tha 
ilcht order: A COW EATS GRASS» and thia atatement is true. 

Affaia. tha wards HORSES FEATHERS HAVE ALL would make a 
aeatcnee if put la tha order ALL HORSES HAVE FEATHERS, but this 
statameat is falsa. 

Below ara twenty-four mixad-up senteaees. Soma of them ara trua 
and some are false. When X say “go/* taka these sentences one at a 
time. Think what each would say if the words were straightened out, 
but don’t write them yourself. Then, If what it would say is true, draw 
a line under the word “trua’*; if what it urould say Is falsa, draw a line 
under the word “falae.” Xf you can not be sure, guess. The two 
samples are already marked as they should be. Bejrin with No. 1 and 
work right down tha page until time is called. 


samples 


a eats cow grass true , falsa 

hones faathan have all true, . false 


iron heavy Is true, .false 

chain alt are to on true, .false 

Alaska in cotton grows true .false 

happy la soan sick always a true, false 

wood eat and good to ara coal true, false 

Germany of Wilson king is England and true, .false 

day it snow does every not true, .false 

war in are useful aeroplanes true .false 

sounds people soma loud annoy true, -false 

10 thunden rains when It alwasrs it true, .false XO 

11 food is tobacco as valuabie a not true .false 11 

12 trees roses sea and in grow the . . . . true falsa 12 

28 pale north equator mile one from is the the true . false IS 

14 a battle in racket very tennis useful Is . . . true, falae 14 

16 made cloth wool cotton and is from true, .false 16 

16 seldom forever good lasts luck . . true, .false 16 

17 a ocean cross minutes few can boat the in a . . . true . false 17 

15 seldom birdk* diamonds nests are in found . .true, false 18 

19 love we wrong those us always who true, false 19 

50 to aid deep great snow a military manoeuvres is. .true, .false 80 

51 never man the show the deeds true, false 21 

SS always Is not a a stenographer bookkeeper true, falae 22 

S8 never who heedless those stumble are true, .false 28 

S4 paeplo snemleo arrogant many asake true.. false 84 



Army Alpha Test {Continued') 


47 


TEST 6 

2 4 « 8 10 

9 8 7 6 5 

2 2 8 3 4 

1 7 2 7 3 


12 14 . .16,.. 

4 3 . ...8. 

4 8 ... 8 ... 

7 . -4 


lAok at each row of numbers below, and on the two dotted lines 
write the two numbers that should come next 


10 

16 

20 

o 

26 

so 

7 

86 



8 

7 

6 

6 

4 

8 



6 

9 

12 

16 

18 

21 



8 

9 

18 

17 

21 

26 



8 

1 

6 

1 

4 

1 



25 

26 

21 

21 

17 

17 



1 

2 

4 

8 

16 

82 



4 

6 

8 

9 

12 

18 



8 

8 

6 

6 

4 

4 



19 

16 

14 

11 

9 

6 



8 

4 

6 

9 

18 

18 



12 

14 

18 

16 

14 

16 



29 

28 

26 

23 

19 

14 



18 

14 

17 

IS 

16 

12 



16 

8 

4 

2 

1 




16 

16 

14 

17 

18 

18 



1 

4 

9 

16 

26 

86 



21 

18 

16 

16 

12 

10 



8 

e 

6 

8 

16 

18 

86 





48 


Army Alpha Test {Continued) 


TEST 7 

( •Icy — bltt«. yrM — tekl* warn kly 

fish— awims . man — rAP*** tiiM wal1» yirl 
day — flight.: white— rad hlMh ciMr pure 

In each of the lines below, the first two words are related to each other In some way. 
What you are to do in each line is to see what the relation is between the first two words, 
and underline the word in heavy type that is related in the same way to the third word. 
Begin with No. 1 and mark as many sets as you can before time is called 


1 flnyer — hand * toe — boa foot doll eoat .... 1 

2 ait— chair .sleep — book troo bod soo- . . . 2 

3 skirts — yirl trousers — boy hat vest coal ... 3 

4 December — Christmas November — month ThanksBiving Docombar early 4 

& above — top: below — above bottom eoa bang . . 6 

6 spoon — soup . fork — knifo plate cup moat .... 0 

7 bird — song man — speech woman boy work . . .... 7 

5 com — horse bread— daily flour man butter . . ' ... 8 

ft sweet — sugar sour — sweet bread man vinegar .... 9 

10 devil— bad . . angel — Cabrial good face haavan 10 

11 Edison — phonograph .Columbus — Amarica Washington Spam Ohio 11 

12 cannon— rifle . big— bullet gun army little .... ..... 12 

13 engineer — engine, driver — hornesa borso passenger man 13 

14 wolf — sheep eat— fur kitten deg 14 

16 officer— private, command — army general sAey raginmni 15 


16 hunter— gun fisherman — fisb net bold wet 

17 cold— heat ice— staem cream frest refrigerator 

18 uncle — nephew., aunt — brother sister niece cousin . 

19 framework— house skeleton— benas sbuti graeo body 

20 breeze— cyclone * shower— bath cloudburst winter spring 


16 

17 

18 

19 

20 


21 pitcher — milk vase— flowers pitcher table pottery 

22 blonde — brunette . light — house oloctricaty dark girl 

23 abundant — cheap * scarce— costly plentiful common gold 

24 polite— impolite ; pleasant— ogroeablo dasagrocoblo man face 

25 mayor— city . general— private novy army soldier 


21 

22 

23 

24 

25 


26 succeed— fail .: praise — lose friend Cod blame.. . 

27 people— house bees — thrive sting kivo thick 

28 peace— happiness . war — grief fight bottle Europe . . /. 

29 a— b 'C — o b d letter .... 

30 darkness — atillnena light— moeadighi saaund sun window 


26 

27 

28 

29 

30 


31 complex — simple 'hard — brittle money easy work 31 

32 music — noise harmonious — hoar accord violin discordant 32 

33 truth — gentleman 'lie — rascal liva giva falsokeed 33 

34 blow — anger : caress — woman kiss child lova 34 

36 aquare— cube. : circle— lina round equare spbera . ... .35 


36 mountain — \alley .genius — idiot write think brain 36 

37 clock — time, thermometer — cold wootbor loroporoturo mercury 37 

38 fear — anticipation . regret — vam memory express resist 38 

39 hope— cheer despair — grave repair death depression ... 39 

40 dismal — dark .. cheerful — laugh bright bouse gloomy 40 



Anny Alpha Test {Concluded) 


TEST 8 

Notice the MRipIc sentence* 

People heer %ith the eyes mm nose moutli 
The correct word is eeri, because tt makes the truest sentence 

In each of the sentences below, you ha\« four choices for the last word Only one of them is correct. 
In each sentence draw a line under the one of these four words which makxi the truest sentence If 
you can not be sure, guess The two samples arc already marked as they should be 

( People hear with the eyes ears nose mouth 

samples] 

( France is in Europe Asia Africa Australia 


1 The pitcher has an important place in teiuiia football basohall handball 1 

2 Crihbage is played with rackets mallets dico cards 2 

3 The Holstein is a kind of cow horso sheep goat 3 

4 The must prominent industry of Chicago is packing btawing automobiles flour 4 

6 The lopaa is usually red yellow blue green 5 

6 The Plymouth Rock is a kind of horso cattle granite fowl 8 

7 Irving Cobb is famous as a baseball player 'setor writer artist 7 

8 Clothing is made by Smith A Wesson Kuppankeimer &. T. Babbitt Swift A Co 8 

9 Carrie Chapman Call is Known as a singer writer nurso suffragist 9 

10 “Tho flavor lasts'* is an "ad" for chewing gum drink health food fruit 10 

11 Timothy is a kind of com ryo wheat hay 11 

12 Kale is a fish liaard vegetable snake 12 

13 Tlic U S. Naval Acadeiry t<t at West Point Annapolis New Haven Ithaca 13 

14 Rio Janerie is a City of Spam Argenlina Portucal Brasil s. 14 

15 Emeralds are obtained from elcphanta mines oysters reefs .... 15 

16 John Sargent is famous as a sculptor author painter poet 16 

17 The Iguana is a reptile bird fish insect .... 17 

18 The clavicle is in tlie shoulder head abdomen nock 18 

19 Karo is a patent medicine disinfectant tooth pasto food product . . 19 

20 Eucalyplue is a machino treo drink fabric .20 

21 The carbine Is a kind of pistol cannon musket sword .21 

22 The mulligraph is a kind of typewriter pencil copying>machino phonogroph 22 

23 Magenta is a fabric drink food color 23 

24 The piccolo is used in music stenography book>binding lithography . . 24 

25 Cambric is a danco fabric food color .. 25 

26 The author of "Treasure Island" is Poo Stevenson Kipling Hawthemo . . .26 

27 Blackslone is most Lsmous in law literature science religion ... .27 

28 The spark plug belongs in the crank case manifold carbureter cylinder . 28 

29 The Bartlett is a kind of fruit fish fowl cattle 29 

SO Kelvin was most famous in politics war science Itterature . ... 30 

81 Little Nell appears in Vanity Fair Romola The Old Curiosity Shop Henry IV .81 

32 The number of a Papuan’# legs is two four six eight ... 32 

33 Arson » a term used in medicine low theology pedagogy . . ... S3 

34 The silo is used m fishing farming hunting athletics 64 

36 A puck ui usfcd in tennis football hockey golf • • • • 65 

36 Oowey defeated the Spanish fleet in Newport News Boston Harbor Chine Sea Manila Bay 36 

37 The volt la used in measuring electricity wind power rainfall water power 37 

38 The Packard car is made in Detroit Buffalo Toledo Flint .... 38 

89 The Cooper Hewitt lamp used the vapor of gasoleno mercury tungsten alcohol 39 

40 A regular fivo-sidod figure la scalono rhomboid equilateral olliplical 40 





ARMY BETA TEST^ 



00 






Army Beta Test (Continued) 51 





52 


Army Beta Test {Continued) 






Army Beta Test {Continued) 


53 


Test 4 



1 . 




6 . 



54 


Army Beta Test {Continued) 


690 


Test 5 

10243586 


041 


650012534 

690021894 

*970 

2970 

888172002 

881872002 

8281 


681027504 . . 

631027904 

99100 

99102 

2490001394 

2490001934 

89190 

.... 30100 

2261090310 

2201090310 

698040 

. . . . 090840 

2911038227 

2011038227 

8299017 

.. 8290917 

313877792 

813377792 

63019001 ... . 

03010901 

1012038967 

1012938907 

80007100 

80007100 

71CC220088 . 

71C222008S 

60031087 

. 00031087 

8177G2B440 . .. 

8177682440 

291004818 

.. . 291004418 

468072063 


200096013 .. . . 

200090013 

9104920003 .... 

. . . . 0104920008 

80019092 

300199002 

3484097120 

... . .348 4097210 

8010060482 . . 

. . 301000482 

8988172996 

.. . 8981722996 

8910273301 

8O10273801 

3120100671 .... 

3120166671 

203136990 

203 130000 

7011348870 . . 

76111349870 

491192003 

. . 491192903 

20997230104 . 

26997230164 

8299010279 

3209010729 

8810002341 .... 

881000J341 

982030144 

982030144 

6971018034 

6'i7l018034 

61998920 

01988929 

88770702914 

38770769214 

211019883 

210019883 

30008120997 

3000812GC97 

670413822 

. . . 070143822 

79098100308 

79698100308 

17198901 

17108991 

41181900726 . 

41181900726 

86482001 


0943020817 . .. 





Army Beta Test {Continued) 55 


Tests 





56 


Army Beta Test {Concluded) 












Part II 

MEASUREMENT OF INTELLECTUAL QUALITIES 




CHAPTER IV 


The Measurement of Mental Deficiency 

I. What Is Mental Deficiency? 

M ental deficiency (or feeblemindedness) is de- 
fined in terms of intelligence. It is the presence 
of a relatively small amount of whatever we understand 
to be intelligence or mental alertness. If intelligence is 
the ability to carry on abstract thinking, then mental 
deficiency is the relative lack of ability to engage in such 
a process; if intelligence is the capacity for learning new 
things, then mental deficiency is the relative absence of 
such a capacity; and if intelligence is a quality of gen- 
eral adaptability, then mental deficiency is a relative 
inability to adapt. Exactly where we draw the line that 
marks off the mentally deficient is arbitrary. The best 
person below the line must be almost, if not quite, as 
good as the lowest person above it. We might say that 
a person is mentally deficient if his learning capacity is 
so low that he cannot graduate from elementary school. 
However, we might arbitrarily, and with just as much 
logic and justification, say that a person is to be labeled 
“mentally deficient” only when he is incapable of learn- 
ing to feed and dress himself. Where the line is drawn 
has been determined largely by convenience in dealing 
with the results. 

Early definitions of mental deficiency or feebleminded- 
ness, before the advent of mental testing and psycho- 

09 



60 Measurement of Mental Deficiency 

logical study of intelligence, were likely to be legal or 
sociological. The legal definitions usually emphasized 
the lack of responsibility of the deficient one before the 
law. Sometimes they concerned the obligations of mill- 
owners and other industrial managers in selecting and 
training child workers in their" plants. Sociological def- 
initions of mental deficiency regard the ability of the 
individual to get along with his fellows, to manage his 
own affairs, to support himself in the world. The men- 
tally deficient find it difficult to adjust themselves in the 
social world; they are unable to earn a living for them- 
selves; they become involved in crimes; their behavior 
is often anti-social; and they may become a burden for 
public support and maintenance. Medical definitions 
for mental deficiency emphasize the neurological basis of 
the lower capacity or ability. Tredgold’s definition illus- 
trates this type: “We may accordingly define amentia 
as a state of restricted potentiality for, or arrest of, cere- 
bral development, in consequence of which the person 
affected is incapable at maturity of so adapting himself 
to his environment or to the requirements of the com- 
munity as to maintain existence independently of exter- 
nal support.” ^ 

With the psychologists’ interests, many behavioristic 
definitions of mental deficiency have arisen. These are 
based upon a comparison of the behavior of the mentally 
deficient with that of the normal individual. They are 
definitions in terms of how he does things as compared 
with others of his own chronological age. For such 
purposes comparisons are made of educational attain- 
ments and of performances on various mental test prob- 
lems. Psychological studies have also been influential 
in giving us a definition of mental deficiency in terms of 

* Tredgold, A. F., Menial Defidency (fifth ed.), Wm. Wood and Co., 
New York. 1929. 



Measurement of Mental Deficiency 61 

a mathematical, statistical, or percentage concept. Such 
definitions are the direct result of mental testing and of 
the hypothesis of a normal distribution of intelligence in 
the whole population. Mental deficiency is regarded as 
the lower end of the distribution curve, and is defined 
accordingly as a certam percentage at the lower end. 
The exact percentage is arbitrary. If taken low enough 
to mean cases commonly found under institutional care, 
it would amount to less than one per cent of the distribu- 
tion of all individuals. If taken to include all those in- 
capable of supporting themselves without assistance, it 
would probably be two or three per cent. Common 
practice among mental test experts is to draw the line 
so as to class about one per cent in the group designated 
as feebleminded. 

So there are several viewpoints from which we may 
consider the quality of intelligence, and each viewpoint 
gives us a somewhat different definition. All the defini- 
tions are important in giving us a total view of the prob- 
lem of mental deficiency. Measurement in psychology 
concerns itself chiefly with the psychological and statisti- 
cal definitions, but the application of the measurements 
cannot neglect legal and sociological aspects of the prob- 
lems. 


II. Degrees and Types of Mental Deficiency 

On the basis of psychological measurements, mental 
defectives are often classified into three groups according 
to degree: (1) the moron, (2) the imbecile, and (3) 
the idiot. The moron is one who at maturity shows a 
mental development only to the point of the normal 8- 
to 10-, 11-, or 12-year-old. (There is some disagreement 
among investigators as to the upper limit.) Imbeciles 
show a development to between the 3- and 7-year level; 
and idiots do not develop beyond a 2-year level in ability. 



62 Measurement of Mental Deficiency 

These classifications can be defined in terms of school or 
social capacities, but since we can measure mental ages 
by psychological tests, the age definitions are more use- 
ful. 

Mental deficiency can also be classified into types ac- 
cording to causal factors. The cretin is a mental defec- 
tive from thyroid gland deficiency, the microcephalic 
from having a small brain, the paretic from syphilitic 
infection of the brain, the hydrocephalic from fluid in the 
cranial cavity, and so on. From the standpoint of mem- 
tal measurements these types are of little significance, 
since the measurement of mental ability is a quantitative 
estimation without reference to cause, and since these 
types are not homogeneous within themselves as to 
mental ability. A cretin may range from borderline dull- 
ness to idiocy; a paretic may be just below normal or a 
total idiot. Some of the types occasionally show normal 
or superior intelligence. 

III. Why Measure Mental Deficiency? 

Mental deficiency is a handicap to the individual in 
any situation in which he must adapt himself to new 
surroundings, learn new modes of behaving, adjust him- 
self to the behavior of others, or solve problems requir- 
ing more than the minimum of judgment or reasoning. 
Problems of mental deficiency may arise in the manage- 
ment of a child in the home; in the instruction of a child 
in school; in the disposition of a delinquent in a juvenile 
court; in the guidance of an adolescent toward a voca- 
tional career; in the handling of an inmate of an institu- 
tion for the mentally disordered ; or in the selection and 
placement of employees in an industrial plant. In all of 
these instances and in many others the objective meas- 
urement of the mental deficiency may be very useful. 



Measurement of Mental Deficiency 68 

Application of a definite measuring stick to the mental 
abilities of the problem individual may point out defi- 
ciencies not evident through general observation; and 
even though the presence of deficiency may be generally 
recognized, a measurement of the degree of defect is an 
important guide in the best solution of the problem. In 
school, the definite measurement of the amount of a 
child’s deficiency may indicate whether he will profit 
more from instruction in the regular grade of a school 
or from instruction in a special atypical school. In an 
institution, measurement of amount of deficiency of a 
patient may suggest beneficial means of therapy and oc- 
cupational training. In industry, a knowledge of the 
level of mental capacity of a failing employee may tell 
the personnel officer what shift in duties is necessary if 
the employee is to continue in the organization with a 
reasonable expectation of success and efficiency. With- 
out measurement, mental deficiencies often exist and 
cause various problems in our relations with human be- 
ings without our suspecting that the deficiency is at the 
bottom of our trouble; we may confuse it with another 
defect or deficiency, or it may be covered or hidden by 
some other trait. Poor vision or poor hearing may simu- 
late feeblemindedness in a school child ; a glib tongue and 
a pleasing manner may conceal a deficiency in an adult. 

IV. Types of Tests Suitable for Measuring Mental 

Deficiency 

The nature of mental deficiency suggests the following 
guides in the selection of mental tests to be applied to 
those low in the scale of intelligence: 

1. The tests should not require too great concentration 
and sustained attention on the part of the subject. Men- 



64 Measurement of Mental Deficiency 

tal defectives are likely to give poor attention to the 
problems presented to them. If left to their own inclina- 
tions to answer a list of questions of a mental test, they 
will often fail because of inattention rather than actual 
inability to arrive at correct answers. Sustained atten- 
tion may be a correlate of intelligence, but attention 
probably should not be the sole determining factor in 
arriving at an estimate of the mental level of a subject. 
This requisite of tests for measuring the mentally defi- 
cient favors the use of individual tests as compared with 
group tests. If group testing is done, only small groups 
should be tested at one time. 

2. Tests should not depend too much upon schooling. 
The mentally deficient are likely to be backward in 
schooling. A mental test in terms of arithmetic, vocabu- 
lary, and reading, which may be suitable for superior 
persons with equal backgrounds of education, is likely to 
be primarily an educational achievement test for the 
mentally deficient. In view of this difficulty it is often 
desirable to choose a mental test which is given orally, 
and therefore does not depend upon the ability of the 
subject to read and write. 

3. Tests for the mentally deficient should not depend 
too greatly upon language ability. General intelligence 
as we ordinarily define it does not manifest itself alone 
in language. It also manifests itself in dealing with more 
concrete things, and the deficient individual is perhaps 
more likely to be able to express himself through a con- 
crete medium. Linguis'tic expression also has the dis- 
advantage of being too closely related to formal schooling. 
We thus find the mentally deficient often being tested 
by picture tests, design tests, or manual-performance 
tests. 



Measurement of Mental Deficiency 65 

Following are some samples pf representative tests 
which have been found useful in measuring the mentally 
deficient. These generally include (1) the individual age 
scales; (2) performance tests; (3) picture and design 
tests; and (4) simple written group tests. 

1. The Individual Age Scales 

It seems appropriate that we examine first that type 
of measuring instrument which has proved most valuable 
in measuring mental deficiency. This instrument is ex- 
emplified in the Binet-Simon Scale and its various trans- 
lations and modifications — ^all individually and orally 
administered scales. 

We have already become acquainted with Binet be- 
cause of his importance in the history of psychological 
measurement. It is of particular interest to us now that 
Binet’s scales for measuring intelligence were developed 
primarily to solve certain problems which he faced in 
dealing with mentally deficient school children; and that 
his original scales were particularly suited to measuring 
such children. Indeed, Terman, in his American re- 
vision of the test, found it desirable to extend and revise 
the upper end of the original scale before it was suitable 
for testing at superior levels of ability. 

In many ways, Binet’s efforts in developing his intelli- 
gence scale represented a marked departure from previous 
and contemporary efforts at testing mental traits. He 
believed, first, that mental ability or intelligence should 
be tested in terms of complex mental processes instead of 
in terms of simple sensori-motor performance; second, 
that mental ability could be adequately measured only 
by presenting to the subject a variety of different tasks or 
problems; third, that a general mental ability rating 



66 Measurement of Mental Deficiency 

could be obtained by a composite of the subject's per- 
formance on a variety of tasks of varying difficulty; 
fourth, that the tasks or problems should depend on gen- 
eral rather than specific school experience; and fifth, that 
tasks and problems to be used in mental scales should be 
studied and standardized as to difficulty by trials on 
groups of individuals chosen at random. 

Following these principles, Binet had by 1905 worked 
out and published his first scale. It consisted of thirty 
tasks or tests arranged in order of difficulty, as follows: 

1. Ability to follow with the eyes a lighted match 
slowly moved before the eyes. 

2. Prehension provoked by a tactile stimulus. (A 
piece of wood is brought into contact with the palm 
or back of the child's hand to see if he will seize it 
and carry it to his mouth.) 

3. Prehension provoked by a visual stimulus. (Cube 
of wood merely shown.) 

4. Recognition of food. (A piece of chocolate and a 
piece of wood are presented.) 

5. Quest of food complicated by a slight mechanical 
difficulty. (Candy wrapped up in paper is pre- 
sented.) 

6. Execution of simple commands and imitation of sim- 
ple gestures. 

7. Verbal knowledge of objects. (Naming of parts of 
the body and familiar objects.) 

8. Verbal knowledge of pictures. 

9. Naming of designated objects. (Common objects 
on a picture must be named.) 

10. Comparison of lengths of two lines. 

11. Repetition of three digits. 

12. Comparison of two weights. (3 and 12 grams.) 

13. Suggestibility. (Asking for objects that are not 
present, etc.) 

14. Definition of objects. 

16. Repetition of sentences. 

16. Comparison of two objects. (Giving differences be- 
tween a fly and a butterfly, etc.) 



Measurement of Mental Deficiency 67 

17. Memory for things in a picture. (Picture shown for 
30 seconds, after which child names objects seen.) 

18. Drawing a design from memory. (Designs are 
shown for 10 seconds.) 

19. Repetition of digits. 

20. Resemblance of known objects. (In what way are 
a poppy and blood alike? etc.) 

21. Comparison of lengths of lines. 

22. Comparison of weights. 

23. Memory for weights. (After blocks have been cor- 
rectly placed in order of weight, one is taken away 
and the subject must find where the gap is.) 

24. Rhymes. (Finding rhymes to a given word.) 

25. Completion of sentences. (Supplying missing word 
to complete a sentence.) 

26. Making up a sentence to contain three given words. 

27. Comprehension of questions. (Twenty-five ques- 
tions of varying dilRculty to be answered.) 

28. Reversal of the hands of the clock. (To be done 
from memory.) 

29. Paper cutting. (Paper folded twice and triangular 
piece cut out. Subject must draw result without 
seeing unfolded.) 

30. Definitions of abstract words. 

This scale was not standardized according to age, but 
Binet gave rough standards of performance to be ex- 
pected from normal children of various ages. 

His own experience with the scale, and the criticisms 
of coworkers, led Binet to make two revisions — one in 
1908 and one in 1911. These both constituted an im- 
portant advance in having tests grouped according to 
age so that mental ability could be worked out as a defi- 
nite measure expressed as “mental age.” Their char- 
acteristics are essentially those of the Stanford Revision. 

The Stanford Revision of the Binet Test. This revi- 
sion of the Binet Scale stands, for several reasons, first 
among a number of American revisions : it is based upon 
the most thorough and extensive preliminary study of 



68 Measurement of Mental Deficiency 

the test material; it is well standardized as to both pro- 
cedure and placement of the problems; it is the most 
commonly used of all the revisions. Terman, who di- 
rected the making of this revision, began his work 
around 1910, and in 1916 published it with an elaborate 
guide book ^ for use in administering and rating the test. 
This scale, like the original Binet Scale, has the tests or 
problems grouped according to age at which they can be 
normally answered; it also is rated in terms of a raw 
score of mental age. Terman made the calculation of 
mental age somewhat easier by the addition of tests 
enough to make six tests for each age level (with some 
exceptions in the upper ages). A summary of the tests 
at three of the ages in the Stanford Revision follows:® 

YEAR III 

1. Ability to point to parts of the body. 

2. Ability to name familiar objects. 

3. Enumeration of objects in pictures. 

4. Ability to give sex. 

5. Ability to give last name. 

6. Ability to repeat 6 to 7 syllables. 

YEAR VI 

1. Ability to distinguish right and left. 

2. Indication of missing parts from pictures. 

3. Ability to count thirteen pennies. 

4. Comprehension. Ability to answer questions. 

5. Ability to name coins. 

6. Ability to repeat 16 to 18 syllables. 

YEAR XII 

1. Vocabulary. Ability to define 40 of 100 test words. 

2. Definition of abstract words. 

* Terman, L. M., The Measurement of Intelligence, Houghton Miff- 
lin Co., Boston, 1916. 

* Quoted by permission of Houghton Mifflin Company. 



Measurement of Mental Deficiency 69 

3. Superior plan for finding lost ball in circular field 
drawn on paper. 

4. Ability to straighten out dissected sentences. 

5. Interpretation of fables. 

6. Ability to repeat five digits backwards. 

7. Interpretation of pictures. 

8. Ability to give similarities between three words. 

The calculation of mental age. In the Stanford Re- 
vision of the Binet-Simon Test (up to age X — the most 
used ages) there are six tests per year, so that each test 
passed is credited as two months of mental age. Above 
the ten-year level fewer tests are available in the scale, 
so that each correct one is credited as three, four, five, or 
six months. In the actual rating of a test the subject is 
given credit for the last year which he gets entirely cor- 
rect, with no errors in years below, and to this basal ^e is 
added the months’ credit for additional tests answered 
above this. For example, if a child answers all questions 
correctly through year VI, answers five out of six cor- 
rectly in year VII, three out of six in year VIII, and none 
above year VIII, his Basal Mental Age is six, and he re- 
ceives ten months’ credit in year VII and six months’ 
credit in year VIII. His total mental age is, therefore, 
six years sixteen months, or seven years four months. 

Average adults attain a rating on the Stanford Scale 
of XIV to XVI. Terman first gave XVI as the average 
adult level; later testings (particularly the Army testing 
during the World War) indicate that it is nearer XIV. 
Adult testing at the normal or superior level, however, is 
rarely done by a Binet type of test, so that the standards 
at the upper level probably have not been so well 
checked. 

Intelligence quotient. Terman introduced with his 
revision of the Binet Test a new mental measure which 



70 Measurement of Mental Deficiency 

was designed to give more meaning to the mentality 
scores derived from testing. This new measure was the 
Intelligence Quotient (often abbreviated I. Q.). It 
is calculated by dividing the individual’s mental age, as 
determined by testing, by his chronological age. The 
mental age of adults as indicated by the test is divided by 
the age which we take as representing the average adult. 
The I. Q. measure amplifies the significance of the mental 
age, since if we are without knowledge of the chronologi- 
cal age of the testee the same mental age may represent 
inferior, mediocre, or superior ability. For example, a 
mental age of 10 is superior in an eight-year-old ; is aver- 
age or normal in a ten-year-old; and is inferior in a 
twelve-year-old. As compared with mental age, the In- 
telligence Quotient is an expression of relative ability, 
signifying degree of “brightness” in a single measure. It 
remains fairly constant throughout life (since mental 
age and chronological age increase together up to the 
limit of mental growth or to “adult” age). 

As we have already suggested, mental deficiency is 
often defined in terms of I. Q. level — and the I. Q. of 70 
has been widely, almost universally, accepted as the arbi- 
trary dividing line between normal (or at least only 
borderline) ability and definite deficiency or feeblemind- 
edness. If 70 is regarded as the dividing line for feeble- 
mindedness, the upper limit for feeblemindedness in 
adults is a mental age of between 9 and 10. Terman 
worked out percentages of randomly selected individuals 
possessing mental ability at the various I. Q. levels. 
According to" his tables, about one per cent of the popula- 
tion is deficient enough to fall into the feebleminded 
grouping. 

The following two examples from Terman’s* feeble- 
minded group illustrate testing at this low level: 

♦Terman, L. M., op. cU., pp. 83 and 85. 



Measurement of Mental Deficiency 71 

R. H. Boy, age H; mental age 8-4; I. Q. 60. Father 
Irish; mother Spanish. Family comfortable and home 
care average. Has attended school eight years and is 
unable to do fourth-grade work satisfactorily. Health 
excellent and attendance regular. Reads in fourth 
reader without expression and with little comprehen- 
sion of what is read. Fair skill in number combina- 
tions. Writing and drawing very poor. Cannot use a 
ruler. Has no conception of an inch. 

R. H. is described as high-tempered, irritable, lack- 
ing in physical activity, clumsy, and unsteady. Plays 
little. Just “stands around.^^ Indifferent to praise or 
blame, has little sense of duty, plays underhand tricks. 

Is slow, absent-minded, easily confused in thought, 
never shows appreciation or interest. So apathetic that 
he does not hear commands. Voice droning. Speech 
poor in colloquial expressions. 

Three years later, at age of 17, was in a special class 
attempting sixth grade work. Reported as doing “abso- 
lutely nothing^' in that grade. Still sullen, indifferent, 
and slow in grasping directions, and lacking in play in- 
terests. No appreciation of anything, but has mastered 
such mechanical things as reading (calling the words) 
and the fundamentals in arithmetic. 

In school work, moral traits, and out-of-school be- 
havior R. H. shows himself to be a typical case of 
moron deficiency. 

A. C. Boy, age 12; mental age 8-5; /. Q. 70. From 
Portuguese family of ten children. Has a feebleminded 
brother. Parents in comfortable circumstances and re- 
spectable. A. C. has attended school regularly since 
he was 6 years old. Trying unsuccessfully to do the 
work of the fourth grade. Reads poorly in the third 
reader. Hesitates, repeats, miscalls words, and never 
gets the thought. Writes about like a first grade pupil. 
Cannot solve such simple problems as “How many 
marbles can you buy for ten cents if one marble costs 
five cents?'' even when he has marbles and money in 
his hands. Described by teacher as “mentally slow 
and inert, inattentive, easily distracted, memory poor, 



72 Measurement of Mental Deficiency 

ideas vague and often absurd,' does not appreciate 
stories, slow at comprehending commands.” Is also 
described as “unruly, boisterous, disobedient, stubborn, 
and lacking sense of propriety. Tattles.” 

Three years later, at ago of 15, was in a special class 
and was little if any improved. He had, however, 
learned the mechanics of reading and had mastered the 
number combinations. Deficiencies described as “of 
wide range.” Conduct, however, had improved. Was 
“working hard to get on.” 

A. C. must be considered definitely feebleminded. 

2. Performance Tests 

The term perjormance test has generally been applied 
to the type of test in which the response is some motor 
or manual manipulation rather than a verbal response. 
Such tests do not depend upon language responses, and 
as a rule are so given that there is no necessity for de- 
pendence upon verbal directions as to procedure. When 
these tests are not so complex as to approach tests of 
mechanical aptitude, they are often useful for measuring 
general ability or intelligence. They are usually rela- 
tively simple as to level of difficulty, they practically 
never depend upon a medium of expression learned by 
formal schooling, and sometimes they test aspects of 
general ability unrevealed by the verbal tests. They 
have been widely used as supplements to the Binet Scale 
and other language tests in the measurement of mental 
deficiency. They have a usefulness also in testing vari- 
ous other special groups — such as the foreign-born who 
does not speak the language of the verbal tests, the deaf 
and blind, and the illiterate. 

There are many types of performance tests. One of 
the most commonly used is the form board test, in which 
the subject being tested fits variously shaped pieces into 



Measurement of Mental Deficiency 73 

corresponding depressions or cut-out places in a board. 
This type of test is a very old device. Before the middle 
of the nineteenth century Seguin had devised a form 
board for use in training mentally deficient children, 
although the board was not utilized as a means of mental 
measurement until over fifty years later, by Norsworthy 
in the study of mentally deficient children. Another 
commonly used type of performance test is the puzzle 
block’ test, in which a number of variously cut pieces 
must be put together to form a picture or object — some- 
what on the order of our recently popular “jig saw” 
puzzles. Various other types of construction, object 
completion, and tapping tests have been utilized. At- 
tainment on these tests usually depends upon accuracy 
of the manual performances and upon speed with which 
the acts are done, some of the tests being graded on first, 
second, and third trials. 

There are available several performance scales for 
measuring intelligence. These consist of a combination 
of several different performance tests (usually ten or 
more), the group having been standardized so that total 
performance on the tests can be interpreted in terms of 
a mental age or some other intelligence measure. 

The Pintner-Paterson Performance Scale is particu- 
larly worthy of mention because of its adequate standard- 
ization and wide use, especially with children. The scale 
is composed of fifteen tests, as listed below. See also 
Fig. 1.‘ 

1. Mare and Foal Board. This is a picture board of 
a mare and foal with a number of cut-outs which 
the subject has to put in the correct places. It is 
very simple, resembling a child’s game, and serves 

® Reproduced by permission of C. H. Stoelting Company, Chicago, 
Illinois. 





74 Measurement of Mental Deficiency 

as a very good introduction for children. Time and 
number of errors are recorded. 

2. Seguin Form Board. Ten blocks representing com- 
mon geometrical forms are to be placed in their ap- 
propriate places. The time of the shortest of three 
trials is recorded. 

3. Five-Figure Board. Five geometrical figures, each 
divided into two or three pieces, are to be placed in 



Fig. Iw — Tests in the Pintner-Paterson Performance Scale. 


their appropriate places. Time and number of er- 
rors are recorded. 

4. Two-Figure Board. Nine pieces to be placed in 
two spaces. Time and number of moves are re- 
corded 

5. Casuist Board. A more difficult board, consisting 
of four spaces into which have to be fitted twelve 
blocks. Time and number of errors are recorded. 

6. Triangle Test. Four triangular pieces to be fitted 
into the board. Time and errors are recorded. 





Measurement of Mental Deficiency 


75 


7. Diagonal Test, Five variously shaped pieces are 
to be fitted into a rectangular frame. Time and 
moves are recorded. 

8. Healy Puzzle A, Five rectangular pieces are to 
be fitted into a rectangular frame. Time and moves 
are recorded. 

9. Manikin Test. Subject has to put together legs, 
arms, head and body to form a man. There is no 
board into which the pieces fit. Quality of per- 
formance is scored. 

10. Feature Profile Test, In the same manner as in 
the previous test, subject has to put together pieces 
to form a head. Time is recorded. 

11. Ship Test. This consists of the picture of a ship 
cut into ten pieces of the same size and shape; 
these are to be fitted together properly into a rec- 
tangular frame. Quality of performance is scored. 

12. Picture Completion Test. Subject is required to 
select the appropriate block out of many possible 
blocks to complete the picture. Quality of per- 
formance scored. 

13. Substitution Test. A sheet of paper with rows of 
geometrical figures upon which the subject has to 
write the proper digit following the key at the top 
of the page. Time and errors are compounded into 
a score. 

14. Adaptation Board. This is a simple test for meas- 
uring the ability of the subject to keep his atten- 
tion upon a moving board. Number of correct 
moves is recorded. 

15. Cube Test. Four cubes are tapped in a certain 
order, and the subject is required to watch and then 
imitate the movement. Number of combinations 
correctly imitated is recorded. 

Each of the tests has been standardized individually so 
that a mental age is derived from the performance. The 
median mental age on the whole series is taken as the 
final mental rating. 



76 Measurement of Mental Deficiency 

Another well-known performance scale is the Army 
Performance Scale. This was worked out during the 
World War for testing foreigners, illiterates, and the re- 
cruits who were of such low ability that they failed the 
group tests. It consists of ten tests, three of which are 
also in the Pintner-Paterson Scale, as follows: 

1. The Ship Test. (See 11 above.) 

2. Manikin and Feature Profile. (See 9 and 10 
above.) 

3. Cube Imitation. (See 15 above.) 

4. Cube Construction. Putting together small cubes 
painted on certain surfaces in such a way as to 
make a larger block painted on certain of its sur- 
faces. 

5. Form Board. 

6. Designs. Copying a series of figures from memory. 

7. Digit-Symbol Test. Same test as used in Army 
Beta Examination. 

8. The Maze. Similar in principle to ones used in 
Army Beta Examination. 

9. Picture Arrangement. A series of “Foxy Grand- 
pa” pictures placed out of order to be arranged in 
story sequence. 

10. Picture Completion. Missing parts of pictures to 
be supplied. 

This Army scale was standardized in point scores, 
which were then translated into the Army letter rating 
scale. Equivalent mental ages for the point scores have 
been calculated. This scale has not been standardized 
on children, but the mental age equivalents are Binet 
mental ages. 

How shall we evaluate performance tests as measures 
of mental capacity? They quite obviously measure 
mental ability in terms somewhat different from those 
in which the more familiar verbal tests measure it. Are 
these terms so different that we should not accept the 



Measurement of Mental Deficiency 77 

performance ratings as indicative of mental ability? 
Generally high correlations have been found between 
performance scale ratings and Binet mental ages, at least 
for the lower levels of ability where the performance 
scales are most often used. Buford Johnson, working 
extensively with the Pintner-Paterson Scale, found a cor- 
relation of .83 between performance mental ages and 
Stanford-Binet mental ages. Performance ratings and 
Stanford-Binet mental ages correlate to .84, as shown in 
the Army studies on American-born troops. Such 
studies lead us to rely upon performance scales as fairly 
valid measures of mental ability, particularly useful in 
those situations in which, for any reason, there is a lan- 
guage handicap in the subject. 

3. Picture and Design Tests 

Like performance tests, picture and design tests are 
devised to avoid the great dependence of mental meas- 
urement upon verbal understanding and response. As 
compared with performance tests they have the advan- 
tage of usually being simpler to administer, of requiring 
less complicated equipment for testing, and of being 
suitable for group administration. Fig. 2 illustrates a 
test of this type. For Army testing during the World 
War, a whole test (the Army Beta) of different parts 
was developed utilizing only pictures, designs, and sym- 
bols. Chapter III contains .illustrations from this test. 

Porteus, in connection with his work with the feeble- 
minded at Vineland, devised a complete mental scale, 
standardized according to ages from 3 to 14, in terms of 
a single type of design — ^the maze. There is a maze for 
each age. The simplest for age 3 consists of a diamond 
with sides of parallel lines about a quarter of an inch 
apart. The subject is tested by his ability to trace 



78 Measurement of Mental Deficiency 

around the diamond, keeping within the parallel lines. 
At the more advanced years the subject is required to 
thread his way without crossing lines through a typical 
maze with many blind alleys. Fig. 3 illustrates mazes 
from the test.* Porteus computes mental age on this 



Fig. 2. — ^Test One — International Intelligence Test. 


test by the highest test passed, with certain deductions 
for lower years that may be failed. 

This scale is easily administered, usually attracts the 

« Reproduced by permission from Maze Tests and Mental Differences, 
the Vineland Training School, 1933. 




Measurement of Mental Deficiency 79 

interest of children and lower-grade mentality, and is 
applicable to groups who cannot be fairly measured by 
language tests. Its limitation as to type seems to be its 
chief drawback as a general intelligence test; it tests only 
one kind of response, while intelligence is manifested in 
many ways. However, in fairness to its author and users 
it should be stated that the test was designed as a sup- 



Fig. 3. — Porteus Mazes. 

plement to such commonly used types as the Binet Scale. 
According to Porteus, the maze scale was devised to 
measure “planning capacity, prudence, and mental alert- 
ness in a new situation of a concrete nature.” Porteus 
believed the type of pre-considered action demanded by 
these tests was important in social as contrasted with 
educational efficiency. 

The utilization of various picture and design tests in 
kindergarten mental measurements should be mentioned, 
although these are just as useful for measuring mental 
superiority in the very young as for measuring mental 
deficiency. The Haggerty Tests, the Pintner-Cunning- 
ham Test, and the Dearborn Tests are typical; several 
others, too, might be mentioned. 




80 


Measurement of Mental Deficiency 


4. Simple Written Group Tests ^ 

While the written group tests are not particularly 
adapted for measurement at the level of ability of the 
mentally deficient, the simpler ones and those which 
cover a wide enough range of ability to include simple 
items may sometimes be used. Such use, of course, 
always presupposes the ability to read and write in the 
subject being tested. 

The Army Alpha Test, the written group test de- 
veloped for testing recruits to the Army during the War, 
is a good example. This test measures intelligence over 
a wide range of ability — from markedly superior ability, 
possessed by those attaining high scores on the test, to 
definite feeblemindedness, possessed by those attaining 
the scores in the lower range. The lower range of Army 
Alpha questions includes such easy ones as: ® 

How many are 20 boats and 9 boats? (Form 9) 

If you save $4 a month for 9 months, how much will 
you save? (Form 9) 

If plants are dying for lack of rain, you should (water 
them, ask a florist’s advice, put fertilizer around 
them) . (Form 6) 

A house is better than a tent, because (it costs more, 
it is more comfortable, it is made of wood) . (Form 
6) 

The apple grows on a (shrub, vine, bush, tree). 
(Form 8) 

The pitcher has an important place in (tennis, foot- 
ball, baseball, handball). (Form 9) 

From Table II it may be inferred that any person ob- 
taining a score as low as 21 on the Army Alpha Test 

’’ For a more detailed treatment of group mental tests, see Chapter V. 

® National Academy of Sciences, Memoirs, Vol. XV, pp. 228-234. 



Measurement of Mental Deficiency 81 

must possess a mental ability low enough to put him in 
the class of “mentally deficient” (I. Q. below 70, taking 
average adult mental age as 16). About one in ten of 
recruits tested by the Army Alpha made scores as low as 
21, Some of these obtained a somewhat higher rating 
on individual tests. 


Table II 

EQUIVALENT MENTAL AGES AND I. Q.’S FOR 
LOW ARMY ALPHA SCORES 


Army Alpha 

Mental 

I.Q.if 

Score 

Age 

Adult 

33 

12-0 

75 

27 

11-6 

72 

21 

11-0 

69 

16 

tO-6 

66 

11 

10-0 

62 

7 

9-6 

59 

4 

9-0 

56 

2 . 

8-6 

53 


Many of the easy group tests commonly used in the 
elementary schools can be used also in measuring mental 
deficiency, provided the deficient subject is sufiSciently 
literate to understand the test. For example. National 
Intelligence Test raw scores of below 70 represent definite 
deficiencies in those who have attained a chronological 
age of as much as 16.® 

We might conclude that with proper safeguards as to 
ability of the subject to comprehend the procedure and 
the medium in which the test is expressed, easy group 
written tests may sometimes be used in measuring mental 
deficiency. We should be inclined, however, to give any 
person scoring low on such a test the benefit of an in- 
dividual test. 


® See Table IV^ page 98. 



CHAPTER V 


The Measurement of Superiority 


M ental superiority — as genius — ^attracted the 
attention of many philosophers, historians, and 
poets of early times, and many such thinkers recorded 
their opinions as to the characteristics, sometimes even 
the psychological nature, of genius. Indeed it seems 
that the philosophers gave more attention to the genius 
than did our modern psychologists, scientific interest in 
high mental ability as a psychological problem having 
been late in developing as compared with the interest in 
deficiency. 

' Carmichael,^ in an address on the psychology of genius, 
admirably set forth these ancient views: 

In folk lore and fable the genius plays a large part. 

In mythology he appears as a man set off from his fel- 
lows by a great chasm. He is a different species, not 
only quantitatively but in every sense qualitatively 
distinct. Usually his uniqueness has been considered as 
beginning in his peculiar origin. The Great Man was 
typically held to be the child of the supernatural, the 
offspring of god or demon. 

This primitive notion— and who will gainsay Car- 
lyle and the rest and claim that there are no reasons for 
its development — ^has persisted in modified form through 
the centuries. Certainly a supernatural theory of this 
sort was held by some of the greatest thinkers of an- ' 


1 Carmichael, Leonard, “The P^chology of Genius,” The Phi Kappa 
Phi Journal, September 1034. 


82 



Measurement of Superiority 88 

tiquity. Genius was essentially unique, a gift from 
another realm, in the view of Plato and Socrates. Simi- 
larly at all times the geniuses of religion, saints and 
prophets, have also been considered as supernatural; 
as men set apart by superhuman powers. In more re- 
cent times this transcendental view has not wanted 
supporters. Schopenhauer, for example, held that the 
genius was what might be called in biological terms 
a psychological mutant. The true genius, he asserted, 
perceived tmiversals, whereas the common man per- 
ceived particulars and conceived universals — a 
view, incidentally, which seems to have nothing in its 
favor save that it was the view of the great Schopen- 
hauer. Emerson, F. W. H. Myers, Hinkle, and more 
recently N. D. M. Hirsch have attempted by essentially 
mystical, or at any rate by non-scientific, arguments 
to develop a similar view of the true genius. Hirsch, 
for example, writing in 1931, claims, but without giving 
any evidence that would be likely to stand for a moment 
in the court of scientific psychology, that THE GEN- 
IUS — ^written in capital letters — has not common intel- 
ligence but intuition, not common volition, but a surg- 
ing will to create, not ordinary emotion but ecstasy. 

As opposed to this old but still vital superior species 
view of the superman is another ancient theory of 
genius which considers the superior individual to be still 
tmique and unitary but abnormal. This is the view ) 
that genius is akin to madness. Thus Aristotle wrote^ 
“Famous poets, artists, and statesmen suffer from mel- 
ancholia or madness as did Ajax. In recent times such 
a disposition occurred in Socrates, Empedocles, Plato, 
and many others, but especially in our poets.” Empe- 
docles himself, strangely enough, also presented a simi- 
lar view. Diderot proposed a “psychic rhythm” view 
of genius which held it to be essentially like manic- 
depressive insanity. Of all those holding this madness 
view of genius, Lombroso is probably most thorough- 
going and best known. He elaborately developed the 
theory that genius was a “biological type,” a type ^ 
stigmatized, in both mind and body and not unallied in > 



84 Measurement of Superiority 

characteristics to other degenerate types; types, it now 
appears, on the basis of much evidence, that his imagi- 
nation created rather than his science discovered. 
Others have held that genius is related to or caused by 
the physiological action of narcotics, alcohol, or the 
toxins of tuberculosis, and thus is comparable to the 
other mental abnormalities brought about by such drup 
or disease. Havelock Ellis has guessed genius to be due 
to a sensitive nervous system and organic inaptitudes, 
such as muscular incoordination, which make social ad- 
justment difficult. Freud contends that genius is akin 
to the neuroses and that it involves an atypical de- 
velopment of the sex life of the individual. Contrasted 
with this, Adler, another psychoanalyst, presents the 
view that genius is the result of an essentially patho- 
logical longing for superiority which leads the person- 
ality to flee from reality and become, through compen- 
sation, sometimes eminent, more often partly or fully 
insane. It is surely true that writers like Nisbet, 
Sanborn, and Kretschmer who have elaborately sup- 
ported this madness position have found many cases to 
substantiate their theses. Alexander, Caesar, Napoleon, 
among the generals, many of the great philosophers, 
musicians, and painters were unstable personalities, 
epileptics, given to hallucinations or to hysterical out- 
' bursts. William Blake, to take but a single example, 
had innumerable visual hallucinations. Like Luther 
he really saw the devil. Once it is asserted that Blake !; 
sketched the ghost of a flea, invisible to others, com- 
plaining that its moving mouth interfered with his 
drawing. Swift, Johnson, Shelley, Byron, Tasso, all 
acted in a manner that today would lead their friends 
to say that they should have the care of a physician 
skilled in mental hygiene. 

Today we do not look upon mental superiority and its 
highest manifestation, genius, in this light. We do not 
look upon the highly intelligent as endowed with any 
supernatural traits or any abilities the possession of 



Measurement of Superiority 85 

which cannot be explained by the same biological laws 
of inheritance that explain other ability levels. We do 
not look upon genius as related to madness or insanity. 
We recognize quirks of behavior or peculiarities in the 
genius as resting only upon the fact that because of his 
higher ability he is less likely to be interested in the things 
of the average intellect and for that reason may occasion- 
ally set himself apart as “peculiar.” We look upon the , 
genius or person of superior ability not as representing a 
qualitative difference in traits, but only as an individual 
differing quantitatively from his fellow beings; only as 
one possessing a higher degree or amount of the same 
types of abilities seen in the average or even in the men- 
tally deficient. 

Superiority might well be defined in those same terms 
that we used in defining deficiency. Sociologically, 
mental superiority connotes an exceptional capacity for 
getting along in the world — ^for adjusting oneself to one’s 
surroundings — for success in general. Educationally, 
mental superiority manifests itself in high capacity for 
learning or high capacity for school success. Mathe- 
matically or statistically, mental superiority designates 
that certain percentage of individuals falling at the upper 
end of the scale as we measure intelligence by psycho- 
logical tests. It is this last definition or view that has 
been the basis of recent studies of mental superiority. 

I. Development of Interest in the Superior 

1. Early studies of special cases. Before the advent 
of psychological tests, interest in and study of mental 
superiority were limited chiefly to cases of child “prodi- 
gies” who from time to time attracted enough attention 
so that their cases were recorded by interested educators. 
These children were usually studied by some educator 



86 Measurement of Superiority 

who became interested in seeing what could be accom- 
plished by this or that method of education. One of the 
earliest records of a case of this type is contained in a 
German book published in 1779 — The Life, Doings, Trav- 
els, and Death of a Very Clever and Very Well-behaved 
Four-year-old Child, Christain Heinrich Heineken of Lue- 
beck. Described by His Teacher, Christain von Schoen- 
eich. Among the remarkable achievements recorded 
for him are: the ability at ten months to name things in 
pictures; the ability at one year to recite stories from the 
books of Moses; and the ability at four years to read, 
add, subtract, multiply, and divide. He is credited with 
knowing French and German, much geography, and some 
1500 Latin sayings. Unfortunately this prodigy died 
before the age of five and his accomplishments could not 
be followed to maturity. Among other infant prodigies 
written about before the days of mental testing were one 
Karl Witte, son of a German pastor; John Stuart Mill; 
and Lord Kelvin. It is difficult to estimate what mental 
test records might have shown for these young persons of 
superior mentality, but it seems certain that any quan- 
titative rating would have been high. 

Sir Francis Galton’s study of the inheritance of genius 
deserves mention in any discussion or study of high 
ability. His book Hereditary Genius, published in 1869, 
is an account of his study of several hundred individuals 
of “genius” ability selected from among famous persons 
who had lived in Europe. His general purpose was to 
show, by a careful study of family background, that high 
ability was an inherited quality. Aside from the in- 
trinsic value which his book has in relation to the ques- 
tion of inheritance, it has significance in the chronology 
of measurement of mental qualities in that it focused 
attention upon the existence of marked individual dif- 



Measurement of Superiority 87 

ferences in mental traits, and in that Galton suggested 
the feasibility of a scale for designating degrees of ability, 

2. Studies of childhood of great men. More recently, 
systematic studies have been undertaken of the childhood 
of great men. These studies have been prompted by 
generally increased interest in the traits of genius and by 
efforts to refute the erroneous popular fallacy that gen- 
iuses often arise from stupid childhoods or have grown 
out of youthful lives of physical weakness, emotional pe- 
culiarity, and similar traits. The best of these studies is 
that of Cox.* From a study of boyhood records (largely 
from biographical sources) she attempted even to arrive 
at estimated I. Q.’s for her list of eminent men. She did 
not arrive at estimates of I, Q. for all of the eminent men 
that would place them in the level of “genius" intellect 
as we use the term in mental testing today, but her 
average I. Q. for the total 300 studied was 135, an I. Q. 
attained in less than one in a hundred tests today. While 
her data cannot be free from error (owing to the manner 
of their determination), they show rather clearly that 
the childhood of most eminent men of today and the 
past has been very similar in achievement to that which 
we now find in the child whose superiority has been ad- 
judged by one or more mental tests. 

3. Development of interest in the superior school 
child. Interest in the superior school child was much 
slower in development than interest in the subnormal 
child. This delay was due partly to the nature of the 
earlier mental tests, which were not adapted so well to 
the measurement of superior ability as to lower ability. 
The original Binet Test did not measure ability accu- 
rately or well at the upper levels (at least not for those 


*Cox, C. M., "The Early Mental Traits of 300 Geniuses/’ Genetic 
Studies of Genius, Vol. II, Stanford University Press. 



88 Measurement of Superiority 

above a very young chronological age). The same criti- 
cism can be made of many of the earlier tests patterned 
after the Binet and of some of the earlier group mental 
tests. A second reason for late development of interest 
in the superior school child (undoubtedly the basic reason 
also for the types of tests developed in the early work) is 
the fact that he is not the school problem that the sub- 
normal child is. While he may get into mischief by lack 
of pursuits to test his full powers, the superior child does 
not draw attention because of lack of progi’ess and in- 
ability to learn. It has been stated that the superior 
child has been discovered by the intelligence test, and 
this has certainly been true of many a superior child 
whose performance would not have been spectacular 
without the extra spurs to attainment that we can offer 
him through knowledge of his superiority. 

Today the psychologist’s and the educator’s interest in 
the superior school child is attested by the various 
schemes of classification through which we attempt to 
vary instruction to fit the level of ability, by the enriched 
curricula offered the bright child, by the rapid progress 
allowed him through the grades, and by the various spe- 
cial studies of his mental, emotional, and physical 
make-up. 

y 4. Terman’s studies of genius. The important work 
accomplished by Terman in studying superior ability 
should not be left unmentioned. Pintner ® has given an 
excellent brief summary of this work. 

About 1,000 children above I. Q. 130 were selected 
for study, and these arc compared with children whose 
I. Q.’s were normal. It was found that among the 


® Pintner, Rudolph, Intelligence Testing, Henry Holt and Co., New 
York, p. 361. 



Measurement of Superiority 


89 


gifted the ratio of boys to girls was higher than that 
in the general population. In racial origin these Cali- 
fornia children were found to be mainly of Western 
European and Jewish stock. The Jewish stock con- 
tributed about twice that expected from the total Jewish 
population of the areas investigated. The average 
social status of the families was much higher than that 
of the average family. In general the family incomes 
are fair, and they live in superior neighborhoods, but 
there are isolated cases from very poor families in in- 
ferior neighborhoods. These children come from fami- 
lies where there are distinguished relatives in much 
greater proportion than would be found in the average 
family. The vital statistics of the families show a 
healthier than average stock, with few cases of insanity 
or feeblemindedness. The anthropometric measure- 
ments show the gifted group physically superior. The 
medical examinations show them also superior to aver- 
age children. In school progress they are 14 per cent 
of their age above the norm in grade location, and 48 
per cent of their age above the norm in intelligence, so 
that they are underpromoted to the extent of 34 per 
cent. Their school marks are better than those of 
ordinary children. The gifted are no more uneven in 
their school abilities than ordinary children. Their oc- 
cupational ambitions are higher than those of the con- 
trol group. In general they have the same type of 
interests as ordinary children. They make more col- 
lections, particularly of a scientific nature. Their play 
interests are in general like those of the control group, 
with a somewhat greater interest in plays that require 
thinking. They are mature in, their play interests, 
showing a greater liking for quieter and less sociable 
games. These gifted children read a great deal more 
than does the average child. The average gifted child 
of 7 reads more books in two months than the average 
control child up to age 15, and the range of reading 
is much wider. In character and personality tests they 
are very superior, about 85 per cent of the gifted being 
above the median of the control group. 



90 Measurement of Superiority 

This work of Tennan, briefly summarized above, is 
important because it gives us a rough picture of the 
gifted child and clears away the old ideas with refer- 
ence to the sickly prodigy and the puny bookworm. 
What we note is the tendency for desirable traits to be 
positively correlated. Along with high intelligence goes 
general all-around superiority. Of course, there are 
individual exceptions. There are children of high I. 

Q. who are physically below normal, who do not like 
active games, who are nervous or vain and conceited 
just as there are normal or subnormal children with 
similar undesirable traits. But such undesirable traits 
do not tend to accompany superior intelligence. On the 
contrary they are less likely to be found among children 
of high I. Q.'s than among children of average or be- 
low-average I. Q.’s. 

5. Binet Test applied to measurement of the superior 
child. The Binet Test and its revisions or modifications 
are often utilized in measuring the superior child, espe- 
cially the preschool child or young child who has not yet 
learned to read enough to take a written group test. In 
fact for the majority of the special studies of superior 
children the Binet type of test has been used as the basis 
of selection of subjects because of its generally accepted 
validity and reliability. 

The following are summaries of two interesting cases 
of superior children tested by Terman * with the Stanford 
Revision of the Binet Test; 

E. B., Girl, age 7-9; mental age 10-1; I. Q. ISO. 

E. B. was selected by the teachers of a small California 
city as the brightest school child in the city (school 
population about 800) . Her parents are said to be un- 
usually intelligent. E. B. is in the third grade, a year 


* Terman, L. M., The Measurement oj Intelligence, Houghton Mifflin 
Co., Boston, 1916, pp. 98 and 101, 



Measurement of Superiority 91 

advanced, but her mental level shows that she belongs 
in the fourth. The test was made as a demonstration 
test in the presence of about 150 teachers, all of whom 
were charmed by her delightful personality and keen 
responses. No trace of vanity or queemess of any kind. 
Health excellent. E. B. ought to be ready for high 
school at 12 ; she will really have the intelligence to do 
high school work by 11. 

E. F., Russian boy, age 8-5; mental age IS; I. Q. ap- 
proximately 155. Mother is a university student ap- 
parently of very superior intelligence. E. F. has a 
sister almost as remarkable as himself. E. F. is in the 
sixth grade and at the head of his class. Although 
about four grades advanced beyond his chronological 
age he is still one grade retarded. He could easily 
carry seventh grade work. In all probability E. F. 
could be made ready for college by the age of 12 years 
without injury to body or mind. His mother has taken 
the only sensible course; she has encouraged him with- 
out subjecting him to overstimulation. 

E. F. was selected for the test as probably one of the 
brightest children in a city of a third of a million popu- 
lation. He may not be the brightest in that city, but 
he is one of the three or four most intelligent the writer 
has found after a good deal of searching. He is proba- 
bly equaled by not more than one in several thousand 
unselected children. How impatiently one waits to see 
the fruit of such a budding genius! 

II. The Superior School Child 

1. Occurrence and characteristics. There are avail- 
able today a great many studies surveying the intelli- 
gence of large numbers of school children. Let us 
examine some typical studies in order to gain an idea of 
the prevalence of superiority among school children as 
well as to gain an appreciation of the general distribution 
of intelligence throughout school populations. 



92 Measurement of Superiority 

During his standardization of the Stanford Revision 
of the Binet Test, Terman measured 905 unselected 
school children by this individual test. He obtained per- 
centages of children at various I. Q. levels as follows: 


/. Q. 


Percentage 
of Casei 


66-65 03 

66-76 23 

76-85 8.6 

86-96 20.1 

96-106 33.9 

106-115 23.1 

116-126 9.0 

126-135 23 

136-145 0.5 


Pintner reports eight other studies of elementary 
school children by individual tests. These are shown 
with the Terman study in Fig. 4. From one study to 
another there are slight differences in distribution, de- 
pending upon such selective factors affecting intelligence 
of groups as locality of the schools tested, occupational 



Fig. 4.— The Dietribution of Intelligence. (Nine Studies.) 


Measurement of Superiority 98 

pursuits of the locality, etc. But the general character- 
istics of the various distributions are the same and the 
differences are rather small. 

Superiority among elementary school children as 
judged from these distributions amounts to about 20 to 30 
per cent of the total number if we include all those with 
I. Q. above 110, though some of these are not very su- 
perior; or to about one or two per cent if we include 
only those with I. Q.’s above 130 (usually called the 
“very superior” group). 

Group testing of elementary school children shows re- 
sults similar to those from individual tests. A report on 
over a thousand National Intelligence Tests in a New 
York public school shows 20.5 per cent with I. Q. of 110 
or above; and 2.7 per cent with I. Q. of 130 or above. 

What do tests in high school show? Surveys similar 
to those just discussed for elementary schools show an 
average I. Q. some 5 to 10 points higher, with an appre- 
ciable increase in the percentage of the total number of 
pupils who fall in the superior grades of intelligence. 
This is accounted for by certain factors at work which 
tend to “select” the pupils who go to high school. Prac- 
tically aU children start elementary school, but as we 
examine farther and farther up the line in school grade, 
we find that a larger and larger number of pupils have 
dropped out. In relation to mental measurements it is 
important to recognize that the selection process is partly 
on the basis of intelligence. The more intelligent tend 
to go to high school because they are capable of doing 
the work and have the requisite interest and ambition, 
while the less intelligent tend to drop out. Thus, with 
those at the lower intelligence levels tending to drop out, 
we find that the distributions for high school show 
greater amounts of superiority as compared with distribu- 



94 Measurement of Superiority 

tions of mental measurements of elementary school chil- 
dren. The contrast is even greater when we compare 
college distributions of mental measurements with public 
school distributions. Fig. 5 shows graphically the selec- 
tive process of our educational systems as it affects the 
distribution of intelligence within the school group. 

Superiority among elementary school children occurs 
about as frequently as it does in the general population; 
in high school about one and a half times to twice as 
frequently as in the general population; and in college 
probably three or four times as frequently. Aside from 



Fig. 5w — Diagrammatic Representation of the Selective Process in 
Determination of Intelligence of School Groups. 


their test performances we find superiority in school 
pupils distinguished by good school work, though often 
not good enough to indicate the amount of superiority 
in ability; by grade placements often ahead of the pupil’s 
chronological age; by above-average participation in ex- 
tra-curricular activities; and by tendency to be found 
more frequently in schools of better communities and of 
cities rather than rural districts. 



Measurement of Superiority 


95 


III. Tests Suitable for Testing School Groups 

Superiority among school pupils is usually discovered 
by those mental tests which have proved most suitable for 
testing of school intelligence in general. Above the pri- 
mary grades, where lack of reading proficiency is a handi- 
cap, these are generally group written tests. Those to be 
described as typical have been selected only because they 
are typical, not because they are distinctly better than 
many others which might have been selected. 

1. The National Intelligence Test. This is one of the 
tests widely used for measuring mental ability in the ele- 
mentary school. Its development was mentioned in 
Chapter III. The test covers 196 score points, divided 
approximately equally among five parts. The nature of 
the test is indicated by the directions and first few prac- 
tice items in one of the forms.® 

EXERCISE 1.— ARITHMETICAL REASONING 

Find all the answers as quickly as you can. Write 
the answers on the dotted lines. Use the sides or bot- 
tom of the page to figure on. 

1. How many cents arc six cents and 


five cents? Answer 

2. A girl earned 75 cents and spent 43 
cents. How much did she have 

left? Answer 

3. How many nickels make a dollar? . Answer 

4. How many square inches are there 
in a card 7 inches long by 6 inches 

wide? Answer 


•’From National Intelligence Test, Scale A, Form 1. Copyright by 
World Book Company, Yonkers-on-Hudson, New York. Reprinted by 
written permission of the publishers. 



Measurement of Superiority 
EXERCISE 2.— SENTENCE COMPLETION 


Write on each dotted line one word to make the sen- 
tence sound sensible and right. 

1. The apple is 

2. Fish swim the water. 

3. Boys girls like to ball. 

EXERCISE 3.— LOGICAL SELECTION 

'man (body cane head shoes teeth) 
Samplbs. dog (blanket chain collar legs nose) 

house (cellar paint room servants walls) 

In each row draw a line under each of the two words 
that tell what the thing always has. 

1. table (books cloth dishes legs top) 

2. apple (basket redness seeds skin sweetness) 

3. shoe (button foot sole toe tongue) 

EXERCISE 4.— SAME-OPPOSITE 

"cold D hot 

Samples - big S large 

best D worst 

If the two words mean the same, write S on the dotted 
line between them. If they are as different as can be, 
write D between them. 

1. yes no 

2. son daughter 

3. light bright 

EXERCISE 5.— SYMBOL-DIGIT 

Make under each drawing the number you find under 
that drawing in the key. Do each one as you come to 
it. 



Measurement of Superiority 97 


KEY 


A 

□ 

oo 

+ 

S 

iP 

e 

y 

o 

1 

2 

3 

4 

5 

6 

7 

d 

9 


Begin here; 


oo 

A 

+ 

tP 

□ 


+ 

A 












The National Intelligence Test is available in four 
forms. It is weU standardized as to procedure and re- 
sults. Grade norms are given for the various forms; also 
age standards which make it possible to translate the 
point scores into mental age and I. Q. records. These 
are illustrated in Tables III and IV.® 

Table III 

GRADE NORMS (AVERAGE) FOR SCALE A, 
NATIONAL INTELLIGENCE TEST 


Grade 

Number oj Cases 

Scale A 

Low 3 . 

2319 

36 

High 3 .. 

2008 

49 

Low 4 

4608 

63 

High 4 ... 

2398 

69 

Low 5 

5376 

84 

High 5 . . 

2154 

89 

Low 6 

4592 

102 

High 6 

2533 

105 

Low 7 ... 

4350 

117 

High 7 ... 

2132 

124 

Low 8 . . . 

2832 

130 

High 8 ... 

. . 1767 

133 


37069 



2. The International Intelligence Test. This test, one 
more recently developed for use in certain racial studies, 


^National Intelligence Test Manval of Directions, World Book Co., 
Yonkers-on-Hudson, 1924, pp. 37 and 39. 




Table IV 


MENTAL AGE EQUIVALENTS FOR NATIONAL 
INTELLIGENCE SCORES 


Score 

Mental Age 

Score 

Mental Age 

Scale A 

Yrs, 

Mos. 

Scale A 

Yrs. 

Mos 

46 . 

8 

6 . 

92 

11 

5 

47 

8 

7 

93 

11 

6 

48 .... 

8 

7 

94 

11 

6 

49 

8 

8 

95 

11 

7 

50 

8 

9 

96 .... 

11 

8 

51 

8 

10 

97 

11 

9 

52 . . 

8 

11 

98 

11 

10 

53 

8 

11 

99 ... 

11 

10 

54 

9 

0 

100 

11 

11 

55 

9 

1 

101 .... 

12 

0 

56 

9 

2 

102 ... . 

12 

0 

57 

9 

3 

103 

12 

1 

58 

9 

4 

104 . ... 

12 

2 

59 

9 

4 

105 

12 

2 

60 .... 

9 

5 

106 

12 

3 

61 

9 

6 

107 

12 

4 

62 

9 

7 

108 

12 

5 

63 

9 

8 

109 

12 

6 

64 

9 

9 

no .. .. 

12 

7 

65 

9 

10 

Ill 

. 12 

7 

66 ... 

9 

10 

112 .. 

12 

8 

67 

9 

11 

113 . . 

12 

9 

68 

9 

11 

114 

12 

10 

69 .... 

10 

0 

115 

12 

11 

70 

10 

1 

116 

13 

0 

71 

10 

2 

117 

13 

0 

72 

10 

3 

118 .... 

13 

1 

73 

10 

4 

119 

13 

2 

74 

10 

5 

120 ... 

13 

3 

75 

10 

6 

121 

13 

4 

76 

10 

6 

122 

13 

4 

77 

10 

7 

123 

13 

5 

78 

. 10 

8 

124 

13 

6 

79 .. .. 

10 

8 

125 

13 

7 

80 

10 

9 

126 

13 

9 

81 

10 

10 

127 . . 

13 

10 

82 

10 

11 

128 

13 

11 

83 . . . 

10 

11 

129 

14 

0 

84 ...... 

11 

0 

130 

14 

1 

85 

11 

0 

131 

14 

2 

86 

11 

1 

132 

14 

4 

87 

11 

2 

133 

14 

6 

88 

11 

2 

134 

14 

7 

89 

11 

3 

135 

15 

1 

90 

11 

4 

136 

15 

a 

91 

11 

5 





98 



Measurement of Superiority 99 

It consists of 200 points divided among eight parts. The 
following quotations from the test outline the parts: ^ 


TEST 1— RECOGNITION OF MISSING PARTS 
Directions: In each of the pictures below some im- 
portant part is missing. For example, the first picture 
has no mouth. Take each of the fifteen pictures and 
draw in the missing part. Don^t take time to do beauti- 
ful drawing, but merely show that you know what part 
is missing. Work as fast as you can. (See page 78 
for sample.) 

TEST 2— INFORMATION AND OBSERVATION 
Directions: If the statement is true, draw a circle 
around the T ; if it is false, draw a circle around the F, 

Sample: T © The sun rises in the west. 

T F 1. As a rule, the temperature is higher during the 
day than during the night. 

T F 2. More thunder storms occur on hot days than 
on cold days. 


TEST 3— MEANING OF WORDS 


Directions: If a pair of words mean the same or 
nearly the same, draw a circle around the S; if they mean 
the opposite, draw a circle around the 0. 


Samples: 


{ 


little small 0 
good bad S (§) 


1. high low S 0 

2. anger wrath S 0 

test 4— discrimination 


Directions: If the situation described below could 
be true or is possible, draw a circle around the P; if it 
could not be true or is impossible, draw a circle around 
the /. 

Sample; P ^ His younger brother was three 
years older than he. 

P I 1. He could count before he could read. 


7 Quoted by permission of the Center for Psychological Service, Wash- 
ington, D. C. 



100 Measurement of Superiority 

P I 2. Even when he was placed with his back to the 
wall, just before he was killed by the firing 
squad, he protested his innocence, and years 
afterward he marveled that he could lie at 
such a moment. 

TEST 5— RELATIONSHIP 
Directions: In each item below the first two words 
are related in some way. Find how they are related. 
Then write a number on the line to show which of the 
last four words is related to the third word in capital 
letters in the same way that the second word in capital 
letters is related to the first. 

Sample: sky : blue :: grass : (1) table 2 

(2) green (3) warm (4) big .... 

1. MOUSE : ELEPHANT LITTLE : (1) animal 

(2) big (3) small (4) cat 

2. WOOD : SOLID :: water : (1) liquid (2) ice 

(3) air (4) wet 

TEST 6— COMPREHENSION 
Directions: Below are two sets of proverbs. For each 
proverb in Section B there is a proverb in Section A 
which means the same or nearly the same. On the line 
before each proverb in Section B write the number of the 
proverb in Section A which means most nearly the same. 
[Proverbs follow] 

TEST 7— REASONING 

Directions: Solve the following problems. .Place the 
answers on the lines at the right. Use the space at the 
bottom of the page for any figuring needed. 

1. How many men are 6 men and 7 men? 

2. How many toes has a man who has lost one 

on each foot? 

TEST 8— UNDERSTANDING DIRECTIONS 
The directions tell you to indicate certain facts in the 
five columns at the right of the table. You will find all 
the information you will need in the table. Read each 
of the ten directions and do exactly what it tells you. 



Measurement of Superiority 101 

Table V shows norms for the International Intelligence 
Test. 

Table V 

NORMS FOR INTERNATIONAL INTELLIGENCE TEST 


Grade Score 

Elementary School : 

Fourth Grade 39 

Fifth Grade 55 

Sixth Grade 76 

Seventh Grade 90 

Eighth Grade 101 

High School: 

Freshman 119 

Sophomore 129 

Junior 142 

Senior 148 


3. The Otis Intelligence Scale. The Advanced Exam- 
ination of Otis’s Group Intelligence Scales is one of the 
most used mental tests for measurement at the high 
school level today. Its parts, indicated by name and 
sample questions, are outlined below.® The test has been 
standardized for grade norms, age norms, and mental age 
equivalents. 

TEST 1— FOLLOWING DIRECTIONS 

Sample Phoblem: Write the fifth letter of the 
alphabet. ( ) 

TEST 2— OPPOSITES 

Directions : Look at the first word in each line, think 
what word means exactly the opposite of it, find that 
word among the five words in parentheses in that line 
and draw a line under it. 

up (short, down, small, low, yoimg) 
hot (warm, ice, dark, cold, fire) 


*From Otis Group Intelligence Test, Advanced Examination. Copy- 
right by World Book Company, Yonkers-on-Hudson, New York. Re- 
printed by written permission of the publishers. 



102 


Measurement of Superiority 

TEST 3— DISARRANGED SENTENCES 

Directions: The words on each line below make 
one sentence if put in order. If the sentence the words 
make is true, underline the word true at the side of the 
page. If the sentence they would make is false, under- 
line the word false, 

men money for work (true false) 

uphill rivers flow all (true false) 

ocean waves the has (true false) 

test 4— proverbs 

Directions: Read each proverb, find the statement 
that explains it, and put the number of that statement 
in the parenthesis before the proverb. 

Proverbs 

(S) The burnt child fears the fire. 

( ) Rome was not built in a day. 

( ) There is not smoke without fire. 

Etc. 

Statements 

1. Time is required to produce anything of value. 

2. Failure follows frequent change of plan. 

3. Unhappy experiences teach us to be careful. 

4. Those in disgrace always want to disgrace others. 
6. There is no result without a cause. 

Etc. 

TEST 5— ARITHMETIC 

Directions: Place the answer to each problem in the 
parenthesis after the problem. Do any figuring you 
wish on the margin of the page. 

1. If a boy had 10 cents and earned 6 cents, 

how much money did he have then?. . . ( ) cents 

2. If a man walks east from his home 7 

blocks and then walks west 4 blocks, how 
far is he from home? ( 


) blocks 



108 


Measurement of Superiority 

3. If a wire 20 inches long is to be cut so that 
one piece is 2/3 as long as the other piece, 
how long must the original piece be?. ... ( ) inches 


TEST 6— GEOMETRIC FIGURES 

Directions: Each problem asks a question that is 
answered by a number. Write the answer to each 
problem in the parenthesis after the statement of the 
problem. 



Look at the figure. What number 
is in the circle but not in the rec- 
tangle? ( ) 


TEST 7— ANALOGIES 

Directions: The first sample means: Finger is to 
hand as toe is to what? Underline the word on each 
line that should go in tlye parenthesis in place of the 
question mark. 

finger: hand — ^toe: (?) foot, knee, arm, shoe, nail 

TEST 8— SIMILARITIES TEST 

Directions: Find the way in which the first three 
things on a line are alike. Then look at the five other 
things on the same line and draw a line under the one 
that is most like the first three. 

hat, collar, glove . . . hand, cane, head, shoe, house 

rose, daisy, violet bush, red, plant, bed, pansy 

desk, bed, chair book, table, floor, pencil, coat 

TEST 9— NARRATIVE COMPLETION 

(The test consists of a story with words frequently 
deleted. These words may be filled in by choosing 
words from columns placed at the right of the story.) 




104 


Measurement of Superiority 


Directions: For each numbered blank in the story, 
choose the best word of the three in the list having the 
same number as the blank. Underline the word you 
choose. You may write these words in the blank spaces 
if you wish, but only the underlining counts. Do 
nothing about the blanks that are not numbered. 

The Reward of Kindness Underline words here: 


Once upon a 1 there 
was a 2 that lived in a 
3 etc. 


1. Time place man 

2. man lion dog 

3. street garden forest 


TEST 10— MEMORY 

(First a story is read to the class. Then the class 
answers from memory questions about the story.) 

Directions: Read each question, and if the right an- 
swer according to the story is yeSf draw a line under 
the word yes. If the right answer is no, draw a line 
under the word no. But if you do not know the right 
answer because the story didn^t say, draw a line under 
the words dMt say. 

Was the story about a king? . . (yes no didn’t say) 
Was the king’s daughter 16 years old? (yes no didn’t say) 
Was she ugly? (yes no didn’t say) 


IV. Mental Tests for the College Level 

College students represent the most highly selected 
school group with which the mental tester deals. While 
college students are not usually measured in terms of 
mental ages and Intelligence Quotients, checks on stu- 
dents who have been tested earlier in their school career 
and are entering college show only rare instances of col- 
lege entrants with I. Q.^s below 100, and we can recall by 
way of comparison that the average elementary-school 
intelligence quotient is about 100. 

Immediately following the development and use of the 
Army mental tests during the World War, the Army 



Measurement of Superiority 105 

Alpha Test was administered to many college students, 
and the records from its use afford us a means of judging 
the superiority of college students as compared with the 
general population as measured in the draft by the Army 
tests. Various of the studies which have been published 
give college average scores on the test from about 80 to 
160. The central tendency for all colleges is between 
130 and 140, total Army Alpha point score. The average, 
based upon over a million tests, for the drafted man was 
63; only about one per cent reached a score of 135. The 
average score for the white oflScers in the Army was 135, 
the same as the average for college students in general. 
While these comparisons may be a little overdrawn, ow- 
ing to the greater facility which the student has for test 
procedures because of recent academic experience, the dif- 
ferences are wide enough, even with liberal allowances, to 
indicate that the college group is an extremely selected 
group in the direction of superiority. 

A few examples of college mental tests used at the 
present time will be discussed. 

1. The Mental Alertness Test.® The author has 
used this test to considerable advantage in testing college 
freshmen at the George Washington University. It con- 
sists of five parts covering 200 score points. The parts 
measure Vocabulary, General Information, Arithmetical 
Reasoning, Reading Comprehension, and Learning Abil- 
ity. The parts are rather similar in nature, except for 
a higher level of difficulty of questions, to the parts found 
useful at the lower levels of ability (elementary and high 
school). This particular test contains a rather unique 
method of testing learning ability through the use of a 
preliminary study sheet, from which the student learns 


•Published by the Center for Psychological Service, Washington, 

D. C. 



106 Measurement of Superiority 

during the test period certain things to be asked about 
later. The average college freshman attains a score of 
125 out of 200 points; only 10 per cent exceed a score 
of 165. 

2. The Thorndike Intelligence Test.*® Immediately 
after the war Thorndike began experimentation with 
mental tests for college students at Columbia University. 
This work led to the development of the Thorndike In- 
telligence Examination for High School Graduates, now 
available in many forms and published in a new form 
for testing each year. This test is the most comprehen- 
sive of the college mental tests, consisting of three parts, 
and containing a total of several hundred points. It re- 
quires almost three hours to give, as compared with 
approximately an hour for most other tests. The test 
contains an unusually large proportion of material meas- 
uring reading-comprehension ability, an ability of un- 
doubted importance to success in pursuit of college work, 
and an ability which has been demonstrated to be closely 
correlated with intelligence in the literate individual. 

The Thorndike Test has been extensively used in col- 
lege testing throughout the United States, and many 
studies of its value have been made. In many instances 
exceptionally high correlations have been obtained be- 
tween the Thorndike test records and success of the stu- 
dents in college work. Wood,** in an extended study of 
the test at Columbia University, reports a correlation 
coefficient of .67 between the test and scholarship in 
college, as compared with a coefficient of only .26 between 
scholarship in college and high school marks. 


Published by the Bureau of Publications, Teachers College, Co- 
lumbia University, N. Y. 

11 Wood, Ben D., Measurement in Higher Education, World Book 
Co., Yonkers-on-Hudson, N. Y., 1923. 



Measurement of Superiority 107 

3. The American Council Psychological Examina- 
tion.** The American Council on Education publishes, 
for college and high school, a mental test prepared under 
the direction of Thurstone, which is becoming widely 
used in colleges. It contains five parts, designated re- 
spectively as Completion, Arithmetic, Artificial Language, 
Analogies, and Opposites. Like the Thorndike Ex- 
amination this test is published in a new edition each 
year. It is more easily given and rated than the Thorn- 
dike test. 


V. Measuring the Superiority of Adults 
Outside of College 

Certain problems in the measurement of superiority in 
adults outside of school and college make the develop- 
ment of special tests for this purpose desirable, although 
few such tests have been developed. The unsuitability 
of the tests designed for college use rests primarily upon 
three objections: (1) They are likely to have too much 
of an academic flavor, reflected in their general nature 
and subject matter. (2) They are likely to depend too 
largely upon high-school content material. (3) They 
often emphasize speed to too great an extent. The valid- 
ity of these objections rests primarily upon the fact that 
adult groups outside of school and college are usually 
very heterogeneous with regard to nature of academic 
background and as to recent experience of an academic 
nature. 

Some of the mental tests develop>ed for selecting em- 
ployees might illustrate special adult tests. The United 
States Civil Service Commission under the direction of 
O’Rourke has devised a series of mental tests at four 

12 Published by the American Council on Education, Washington, 
D. C. 



108 Measurement of Superiority 

levels of ability for testing adult applicants for federal 
positions. The highest level of these tests reaches the 
ability requisite for high-grade professional and technical 
positions. 



CHAPTER VI 


Intellectual Measurement of the Insane 

T he need for definite quantitative measurement in 
the field of mental disorders, or insanity, cannot 
be too greatly emphasized. Those who examine for 
symptoms of mental disorder are too prone to rely upon 
general observation and to regard such general methods 
as sufficient and accurate enough. This attitude has 
been fostered somewhat by the lack of accurate meas- 
uring devices for insanity symptoms and characteristics, 
but also by the too generally accepted notion that 
common sense is all one needs to appreciate the mani- 
festations of mental disorder. 

The complete measurement of mental disorder includes 
measurement of intelligence, of other mental traits 
which may be of particular importance in certain types 
of disorder, of personality, and of physical or physiologi- 
cal variants which are the basis of the disorder. For 
each of these we should know the expected measurement 
for the various classes of mental disorder. Unfortu- 
nately, no such standards are generally available. In 
fact, quantitative measurement in relation to insanity is 
only in its experimental stage. 

This chapter will be concerned principally with mental 
tests in insanity. Personality tests are discussed else- 
where, as are also physiological methods of measurement 
useful in diagnosing insanity. 

109 



110 Intellectual Measurement of the Insane 


I. General Mental Tests 

Measurement of mental level is important to the stu- 
dent of mental disorders, because in many of the major 
psychoses there is an accompanying deterioration in intel- 
ligence. In this connection, lack of intelligence should 
not be confused with insanity. The two are by no means 
synonymous. An individual may lack intelligence with- 
out being insane, as in cases of pure feeblemindedness; 
or an individual may be insane while still retaining his 
general intelligence in most respects, as in the case of 
many individuals suffering from paranoia (a disorder 
characterized by persecutory delusions). Low intelli- 
gence simply implies inability to cope with abstract 
ideas, and inability to reason, make judgments, think 
logically, and generally adjust oneself to new problems. 
Insanity implies a condition involving pronounced 
changes in aspects of the individual’s personality. In 
connection with mental disorders, a knowledge of changes 
in intelligence may be of considerable diagnostic and 
prognostic value. Marked intellectual deterioration 
usually means considerable progress in the disease and & 
rather insidious type of disease. Prognosis for recovery 
is markedly diminished if the disease is accompanied by 
deterioration of intelligence. 

The most commonly used test for determining the in- 
telligence level of the psychotic patient is the Binet Test 
— ^in America, the Stanford Revision of this test. Next 
to this test come the various performance scales. The 
popularity of' the Binet Scale for testing among the 
mentally disordered is related primarily to the fact that 
it is an individually administered scale. Of the scales 
for individual intelligence testing, it is probably the best 
test now generally available. For the younger years it 



Intellectual Measurement of the Insane 111 


is very well standardized, and the test situations are 
quite well adapted to the general interest and level of 
the individual. There is, however, serious need for a 
scale better adapted to adult level. Even for the adult 
whose intellectual level is only seven or eight years, the 
subject matter of the test is unsuited to the person’s gen- 
eral experience. For example, at year eight, two of the 
judgment questions are: “What’s the thing for you to 
do when you are on your way to school and notice that 
you are in danger of being late?” and “What’s the thing 
for you to do if a playmate hits you without meaning to 
do it?” The problems of making change in money, and 
many of the sentence comprehension tests, are also in 
child terms. The scale as a whole was constructed for 
children, and is largely in terms of child and school sub- 
ject matter. In the absence of better adapted scales, 
it has been widely applied in testing adults who may 
have the intelligence of the child but live in a different 
way and deal with a set of problems entirely different 
from those of the child. 

Practically the only intelligence scales better adapted 
in subject matter to adult problems are group scales 
entirely of the pencil-and-paper type. Such tests, how- 
ever, demand too continuous a voluntary attention on 
the part of the individual, whose mental disorder may 
have seriously interfered with his attention power; or 
they contain too few different types of mental response; 
or they handicap the individual whose lack or remoteness 
of school experience precludes facility in reading and 
writing. For uniform results in testing the mentally 
disordered it is practically necessary to have an individual 
test of the Binet type, in which various kinds of mental 
tests are included. 

The following examples of mental testing of two de* 



112 Intellectual Measurement of the Insane 


mentia praecox patients illustrate two interesting fea- 
tures often noticed in mental testing of the insane: 

A case of intelligence deterioration. C. G., age 22 at 
time of commitment to institution as a dementia prae- 
cox. Her history describes her as having had many 
mannerisms from the early ’teens, such as tendencies 
to touch her face and teeth while constantly and con- 
tinuously moving the muscles of her face. During a 
period before her admission she had become very untidy 
in her appearance, claiming that it was wasteful to 
spend money on clothes. She had lost two jobs as a 
result of peculiar emotional outbursts. Intellectually 
her record showed good school work done through the 
elementary school and beginning of high school, an in- 
dication without definite test results of intellectual level 
to the extent of at least a mental age of 14 or 15. At 
the beginning of second-year high school work the his- 
tory shows a definite slipping in ability to do school 
work, and the patient had dropped out of school before 
the end of the year. Actual test results showed, on 
hospital observation and testing two years before com- 
mitment, a mental age of 13.6 years. At the time of the 
commitment the mental age by the Binet Scale was 12 
years 2 months, with a prognosis of likelihood of per- 
sistent deterioration. 

This case is t3Tjical of the intelligence findings in the 
deteriorating psychoses, including the groups dementia 
praecox, paresis, and senile psychoses. The severer cases 
of epilepsy should also probably come in this deteriorat- 
ing group. In these cases successive mental tests prac- 
tically always show a decrease in mental ability. 

A case of marked “scatter” in results. E. F., age 25, 
diagnosed as dementia praecox. Description of the 
case shows typical emotional and personality picture 
of the dementia praecox. A Binet Test performance at 
age 25 shows the following results: 



Intellectual Measurement of the Insane 113 

8- year level and below .... able to perform all tests. 

9- year level 5 tests out of 6 correct; 

misses test of sentences 
containing three words. 

10- year level 4 tests out of 6 correct; 

misses absurdities test 
and reading and report. 

11- and 12-year level 5 tests out of 8 correct. 

13- and 14-year level 1 test out of 6 correct; 

solves arithmetic prob- 
lems. 

16-year level 1 test out of 6 correct; 

repeats digits. 

The total mental age on the tests is 11 years, 6 months. 

This case is cited as an example of extreme “scatter” in 
test performance. A test result shows little scatter when 
the individual rather abruptly reaches the limit of his 
performance, that is, when his misses are confined to a 
range of a few years on the scale. A test result shows 
great scatter when the misses are spread through a 
range of several years on the scale; in the case just cited 
the range covers eight years. Scatter on a Binet Test is 
generally considered as of significance in diagnosis of 
psychotic cases. Psychotic cases usually show greater 
scatter than normal cases or cases of pure feebleminded- 
ness. The exact reason for this is not clear. It may be 
due to disturbance of attention factors entering into the 
test performance, but is more probably due to uneven- 
ness of deterioration in the various aspects of mental 
ability. 

Measurements in special realms or of special factors of 
mental ability have often proved valuable in dealing 
with the mentally disordered. We shall discuss memory 
tests and association tests as two of the most important 
of these measuring devices. 



114 Intellectual Measurement of tiie Insane 


II. Memory Tests 

Memory tests in clinical testing are often more than 
simply supplementary to general intelligence tests. 
There are certain mental disorders which manifest them- 
selves primarily in memory disturbances — the various 
amnesias. There are other mental disorders which mani- 
fest themselves in part in special disturbances of mem- 
ory, as the senile’s poor memory for recent happenings 
and relatively good memory for happenings of long ago. 
These and other reasons make it desirable to have sys- 
tematic quantitative means of observing memory dis- 
turbances. 

Wells ^ has described what appears to be a very ade- 
quate memory test — one which he has used at the Boston 
Psychopathic Hospital. This test includes the following 
parts: 

1. Old personal memories. 

2. Current events. 

3. Memory of common school information. 

4. Rote memory of alphabet and figures to 20. 

6. Memory of substitution problem. 

6. Repetition of sentences. 

7. Repetition of figures forward. 

8. Easy town-state knowledge. 

9. Hard town-state knowledge. 

10. Figures backward. 

11. Naming objects exposed for short time from mem- 
ory. 

12. Recognition, among a larger group, of twelve cards 
previously exposed. 

This test has been standardized so that standards of 
normality are known. Wells reports tests of 179 insane 

1 Wells, F. L., Mental Teste in Clinical Practice, World Book Co., 
Yonken-on-Hudson, N. Y., 1927. 



Intellectual Measurement of the Insane 115 


individuals in which averages of the group as a whole 
showed performances about equal to normal perform- 
ances in parts 1, 2, 3, 4, 8, 11; performances somewhat 
below normal in 6, 7, 12; and performances considerably 
below normal in 5, 9, 10. When classified into groups 
of different types of insanity, the greatest departures 
from normal were shown by the “Organic Brain Condi- 
tion Group” and “Paresis Group” (mental disorder from 
syphilitic infection). 

III. Association Tests 

Association tests have been used for indicating various 
mental tendencies. They have been useful in studying 
disturbances of flow of thought ; for giving a general in- 
dication of mental capacity; and for discovering the 
manner or process of building up tendencies to reaction 
in some of the pseudo-mental disorders which are simply 
cases of bad habit formation. Considerable information 
on associations of which a person may be capable can 
be brought out in conversation with him, if the conver- 
sation is directed to cover a sufficiently varied and wide 
field and if the words and expressions emitted in con- 
versation are carefully noted. For convenience, how- 
ever, and for more accurate investigation, several kinds 
of association tests have been devised. The most useful 
arrangement for this purpose is a standardized list of 
words (usually 100) to each one of which the subject 
must respond with the first word that comes into his 
mind. 

The Kent-Rosanoff list of words is one of the most 
used word association tests. Their list is reproduced in 
Table VI. Kent and Rosanoff have attempted to stand- 
ardize their test for normality of response. Their 
standards were made up by giving the list of 100 stimulus 



116 Intellectual Measurement of the Insane 


Tabl* VI 


FREE ASSOCIATION TEST (KENT-ROSANOFF)* 


1. Table 

26. Wish 

51. Stem 

76. 

Bitter 

2. Dark 

27. River 

52. Lamp 

77. 

Hammer 

3. Music 

28. White 

53. Dream 

78. 

Thirsty 

4. Sickness 

29. Beautiful 

54. Yellow 

79. 

City 

5. Man 

30. Window 

55. Bread 

80. 

Square 

6. Deep 

31. Rough 

56. Justice 

81. 

Butter 

7. Soft 

32. Citizen 

57. Boy 

82. 

Doctor 

8. Eating 

33. Foot 

58. Light 

83. 

Loud 

9. Mountain 

34. Spider 

59. Health 

84. 

Thief 

10. House 

35. Needle 

60. Bible 

85. 

Lion 

11. Black 

36. Red 

61. Memory 

86. 

Joy 

12. Mutton 

37. Sleep 

62. Sheep 

87. 

Bed 

13. Comfort 

38. Anger 

63. Bath 

88. 

Heavy 

14. Hand 

39. Carpet 

64. Cottage 

89. 

Tobacco 

15. Short 

40. Girl 

65. Swift 

90. 

Baby 

16. Fruit 

41. High 

66. Blue 

91. 

Moon 

17. Butterfly 

42. Working 

67. Hungry 

92. 

Scissors 

18. Smooth 

43: Sour 

68. Priest 

93. 

Quiet 

19. Command 

44. Earth 

69. Ocean 

94. 

Green 

20. Chair 

45. Trouble 

70. Head 

95. 

Salt 

21. Sweet 

46. Soldier 

71. Stove 

96. 

Street 

22. Whistle 

47. Cabbage 

72. Long 

97. 

King 

23. Woman 

48. Hard 

73. Religion 

98. 

Cheese 

24. Cold 

49. Eagle 

74. Whiskey 

99. 

Blossom 

25. Slow 

50. Stomach 

75. Child 

100. 

Afraid 


words to a thousand normal subjects and tabulating the 
responses of these subjects. Each normal word offered 
in response to a given stimulus word has a frequency 
index dependent upon the number of the thousand in- 
dividuals giving that response. Responses not listed in 
the frequency tables are termed “individual” reactions. 
The standards afford a measurement of the tendency of 
an individual to respond along the same lines as his fel- 
low-beings or to respond in ways peculiar to himself. 

It is interesting to compare the “individual” reactions 


2 Reprinted, by permission, from Manual of Psychiatry, by Aaron 
RosanofT, published by John Wiley and Sons, Inc., New York, 1927. 



Intellectual Measurement of the Insane 117 


to the Kent-Rosano£f words of insane patients with re- 
actions of normal subjects (see Table VII). Only about 
7 per cent of the responses of normal adults are of the 
‘‘individual” type, as compared with about 27 per cent 
of responses of the insane. 

Table VII 

RESPONSES OF VARIOUS GROUPS TO KENT- 
ROSANOFF ASSOCIATION TEST* 

% Individual 


Svbjects Reactions 

1000 normal adults 6.8 

247 insane adults 26.8 

260 defective children 13.0 

125 normal white children 8.6 


Murphy* has reported a study of Kent-Rosanoff re- 
sponses of" 250 normal individuals, 120 dementia prae- 
cox patients, and 82 manic-depressives. He classified 
their responses into thirteen categories, such as similarity, 
contrast, adjective-noun, rhyme and sound associations, 
etc. The manic-depressive gave more rhyme and sound 
associations than did the dementia praecox patients; 
but no marked association differences between the two 
insane groups are reported. 

« Ibid., 

^ Murphy, G., ^Types of Word Association in Dementia Praecox, 
Manic-Depressive and Normal Persons,” American Journal of Psy- 
chudry, No. 2, p. 539, 1923. 



CHAPTER VII 


The Uses of Mental Tests 


I N THE three preceding chapters we have discussed 
the development of mental tests and the nature of 
the tests which have been constructed. We shall now 
describe various uses made of these tests and indicate 
some of the results of their use in several fields. The 
investigations of the use of mental tests have been so 
numerous that it would be impossible to describe or even 
mention all of them. In our discussion we ‘shall not 
attempt to do more than to select representative studies. 

As we have already seen, the beginning of intelligence 
testing was related to the study of feeblemindedness, par- 
ticularly in the classroom. Mental tests later began to 
be used for dealing with mental deficiency met elsewhere 
— among delinquents or among workers. With the de- 
velopment of group testing methods and with the dem- 
onstration of the utility of mental tests for measurements 
at the level of normal and superior ability as well as of 
inferior ability, psychologists undertook studies of the 
mentality of school children in general, and began the 
application of mental tests to all levels of ability in the 
educational world. Still somewhat later came the ap- 
plication of mental tests in industry and business. It 
has lagged somewhat behind academic applications, but 
certain values of mental tests in the selection of em- 
ployees and in the classification of workers have now 
been generally recognized; and mental measurements are 

118 



The Uses of Mental Tests 


119 


proving an important aid in the solution of problems 
confronting the personnel manager and the industrial 
executive. During much of this use of tests in schools, 
clinics, and industries, various special studies and re- 
search studies utilizing the tests have been carried on. 
Some of these have aimed at improving the tests them- 
selves, and some have aimed at finding out the relation- 
ships between intelligence and various other factors. 

I. Uses in College 

College students have probably taken as many intelli- 
gence tests as any other group. College instructors have 
usually been the ones to develop the instruments of 
measurement, and for that reason they have been par- 
ticularly interested in applying the tests to their own 
problems. Furthermore, the college student seems to 
have had more than his share of acting as the “guinea 
pig” for the trying out of new material. We thus find 
innumerable published studies discussing the uses of 
mental tests with college students. 

It is difficult to make an accurate statement regarding 
the extent of use of mental tests in colleges. In 1923-24 
Toops undertook an extensive survey in which he 
gathered information from colleges throughout the 
country regarding their use of mental tests. He then 
found that 60 per cent of colleges and universities in this 
country were making official use of mental tests. There 
seems to have been no similar survey at a more recent 
date, but it is likely that the figure at the present time 
would be somewhat over 60 per cent. It is also to be 
remembered that many of the colleges which make no 
official use of tests administer and use them informally 
at the behest of certain instructors, particularly in the 
psychology and education departments of the colleges. 



120 


The Uses of Mental Tests 


There are probably very few colleges in this country at 
the present time in which mental tests are not admin- 
istered and used in some way with a considerable number 
of the students. 

The employment of mental tests for general and ad- 
ministrative purposes in college includes primarily four 
types of uses: (1) as a partial basis for admission; (2) 
in dealing with low scholarship cases in connection with 
decisions as to dismissals or probation; (3) in dealing 
with disciplinary cases; (4) in appraisal of transfer 
credits from little-known institutions. 

The first of these uses is illustrated in a procedure 
which has been followed in the George Washington Uni- 
versity. At this institution the student who has at- 
tended an accredited secondary institution and has at- 
tained a preparatory school record in the upper part of 
his class is admitted to the University on the basis of 
this record. However, the student who applies for 
admission from an unaccredited preparatory school or 
from a school whose standards are little known, or the 
student who applies from an accredited preparatory 
school but whose record is relatively low, is required to 
take a mental test and an educational achievement test 
as a basis for his admission. It happens quite frequently 
that by this method students of superior mentality and 
considerable chance of college success are discovered and 
admitted to college, whereas otherwise they might have 
been denied admission. An example of such a student 
may be quoted from a group of case studies made at this 
institution. 

Case 19. Boy, age 18, graduated from a small high 
school with average grades. Interested in studying 
medicine. Applied for entrance to the University in 
September 1932. Percentile scores on the Thorndike 



The Uses of Mental Tests 121 

Intelligence Test and the Iowa High School Content 
Test administered on application were respectively 71 
and 97. Admitted to college on the basis of these 
scores. Since he has been in college he has made a 
quality point index of plus 2.89 (almost a “B” aver- 
age) . He has worked and has paid a considerable pro- 
portion of his college expenses. He is a member of a 
social fraternity and of a professional chemistry fra- 
ternity. At the time of this study, which was at the 
end of his sophomore year, he was still registered in the 
University as a pre-medical student and was maintain- 
ing a good standing. 

Mental tests are of considerable advantage in dealing 
with low scholarship cases in college. Low scholarship 
and low mental ability would suggest in most instances 
the advisability of dropping the student from the college 
or university and encouraging his pursuit of some other 
type of training. On the other hand, low scholarship 
and high mental ability might suggest the advisability of 
investigating and removing, if possible, other reasons for 
the low scholarship. 

In addition to the four general and administrative uses, 
individual instructors and administrative officers in col- 
leges often make many other uses of mental test results. 
During the year 1933-34 Remmers, of the Psychology 
Department of Purdue University, made a study of the 
uses which instructors made of their test results by cir- 
culating a questionnaire among the staff members of the 
University. Table VIII, quoted from Remmers’ study,^ 
shows the uses which the instructors indicated, with the 
percentage of the staff members making each of the uses. 
Most of the studies of mental tests in colleges, and the 


1 Remmers, H. H., “Report on the Uses Made of the Freshman En- 
trance Test Results at Purdue University,” BvUetin of Purdue Unt- 
vendly, Vol. XXXIV, No. 4. December 1933. 



122 


The Uses of Mental Tests 


Table VIII 

USES MADE OF MENTAL TESTS AMONG STAFF 
MEMBERS AT PURDUE UNIVERSITY 


Per Cent 

Use Making Use 

Encouraging extra effort in the case of unmotivated 

bright students 46 

Basis for individual consultation for improvement 

of study habits 46 

Determining amount of academic work to carry; a 
minimum for high students, a maximum for low 30 
Advice to sororities and fraternities re prospective 

pledges 26 

For advising election of curricula or major subject 22 
Encouraging capable students to undertake graduate 

work 20 

Determining amount of work for self-support 19 

As partial basis for dismissal for low scholarship ... 14 

Dealing with disciplinary (deportment) cases ... 12 

Making recommendations for scholarships or fellow- 
ships 12 

Sectioning students according to capacity for progress 10 

Appraisal of value of transfer credits 8 

As partial basis for admission. 6 

Determining membership in honorary scholastic so- 
cieties 6 

Selection of assistants 5 

Hiring student clerical help 6 

Research purposes of graduate students or students 

in classes in "mental tests” 5 

Appointment of students to non-academic offices . 6 

Establishing control sections in research work. ... 4 

Satisfy curiosity 2 


type of study on which most of the uses are predicated, 
are those which investigate the relationship between 
intelligence test results and scholastic attainments of the 
students. Pintner* reports somewhat over seventy-five 
of the important studies that have been made of the re- 
lationship between mental tests and college grades. The 


> Pintner, Rudolph, InteUigence Testing, Henry Holt and Company, 
New York, 1931, p. 302: 



The Uses of Mental Tests 128 

coirelation coefficients between these two variables as 
obtained in the various studies show a central tendency 
of about plus .45. There are very few correlations below 
.30 and very few above .60. About the highest values 
that have been claimed for mental tests in predicting 
college grades are those announced by Wood in his in- 
vestigation of the predictive value of the Thorndike 
Intelligence Test as utilized in Columbia University. He 
reports a correlation between the test and two-year 
scholarship of Columbia men of .67. This is markedly 
superior to the correlation which he reports for the same 
group between secondary school marks and college work, 
this correlation being plus .26. In general, we may state 
that the correlations between scholastic performance in 
college and mental test scores show that intelligence is 
one of the most important factors making for high marks. 
However, the relationships are sufficiently low to in- 
dicate that intelligence is by no means the only factor. 
Other factors, such as interest, health, effort, etc., also 
influence scholastic attainments. 

II. Uses in Schools Below the College Level 

In general, the lower schools in using mental tests have 
purposes similar to those of colleges. As in college, most 
of the uses are based upon the assumption that a meas- 
urement of mental ability is a good device for predicting 
attainments in school. Studies in high school and ele- 
mentary school have shown that the relationships be- 
tween grades in school and mental test records are about 
the same as relationships between scholastic attainment 
in college and mental test records. The correlations for 
these studies also are generally around .45. 

The student of mental tests will notice certain uses 
and certain types of studies that are perhaps rather pe- 



124 


The Uses of Mental Tests 


culiar to the lower schools. One of the most important 
of these is the marked use that has been made of mental 
tests for classifying pupils into ability sections. Many 
of our large school systems today have established sepa- 
rate classes for the superior, the average, and the inferior 
child of a given grade. Usually these classifications are 
based either entirely upon intelligence testing or upon a 
combination of intelligence measurement and educa- 
tional-achievement measurement. These classification 
schemes have proved to be of considerable value where 
variations in teaching method have been made to suit 
the mental ability of the group, or where separate cur- 
ricula have been worked out for the different groups. 

In the lower schools mental tests have frequently fig- 
ured in special studies, as: survey studies in which whole 
school systems have been measured; studies which aim at 
making certain comparisons in mental ability of school 
children, such as comparisons from school to school or be- 
tween city and rural or private and public schools; and 
studies which have aimed at finding out the amount of 
overlapping from grade to grade in the elementary or 
high school. 

III. Uses of Mental Tests in Vocational Selection 

Next to the use in schools, the greatest use of mental 
tests has probably been made in the selection of per- 
sonnel in various industries and various departments of 
government. The use of mental tests for this purpose 
has lagged somewhat behind application in the schools, 
and even today we find the percentage of industries and 
personnel agencies in the government making use of 
mental tests considerably smaller than the percentage 
of schools making use of tests. 

One of the most comprehensive studies of employment 



The Uses of Mental Tests 125 

• 

methods used in American private personnel manage* 
ment was made by Stanley B. Mathewson.* He sent 
questionnaires regarding personnel practices to 500 na- 
tionally known business concerns, picked at random. He 
received 195 answers. The 195 companies represented 
were located in 21 states and employed 2,391,000 workers, 
the numbers in the separate organizations ranging from 
100 to 240,000. Forty-one different kinds of business 
enterprises were represented. In this survey he found 
that psychological tests were included among the em- 
ployment methods of only 17 per cent of the firms. It 
may be of interest to note that the method for selection 
used in the largest percentage of firms was that of inter- 
views, these being employed in 93 per cent of the firms. 

Two years ago, Hubbard* conducted a survey of em- 
ployment methods used by various state governments. 
Questionnaires were sent to each of the 48 states and 
replies were received from 36. Of the 36 states replying, 
7 had civil service commissions, 4 reported some form of 
centralized personnel agency other than a civil service 
commission, and 25 were without any centralized per- 
sonnel agency. The results obtained are shown in Table 
IX. It will be noticed that intelligence tests are reported 
as being used in 7 of the 11 having some centralized per- 
sonnel agency, but in only 1 in 25 of those without any 
centralized personnel agency. Obviously the use of 
mental tests in selecting personnel is related to the 
general development of progressive personnel methods. 

The chief purpose of mental testing in vocational se- 
lection is to predict the fitness of prospective applicants 


* Scott, W. D., Clothier, R, E., and Mathewson, S. B., Personnel 
Mancmement, A. W. Shaw Company, Chicago, 19311. 

* Hubbard, Henry Furness, A Study of Objective Employment Tests 
in the New Jersey CivU Service, impublished thesis, 1934. 



126 


The Uses of Mental Tests 


for the work for which they are applying. Fitness for 
work refers to that combination of traits and abilities 
which will enable the individual to maintain a satisfac- 
tory output of work with a minimum of expenditure of 
energy and with the least evidence of maladjustment to 
the job. Fitness for work is determined primarily by 
four factors: (1) proficiency, knowledge, or skill; (2) 
capacity, competency, or general ability; (3) personality 
and character; and (4) interest in the work. General 
mental tests or intelligence tests have a place in meas- 
uring the second of these factors, and it is the measure- 
ment of this factor with which we are concerned in the 
present discussion. The others of these factors will re- 
ceive discussion elsewhere in the book.® 


Table IX 

EMPLOYMENT METHODS USED BY STATES 



—Number of States — 


With central- 

Without 


ized personnel 

centralized 

Employment Method Used 

agency 

agency 

Test of knowledge of duties 

9 

1 

Intelligence tests 

7 

1 

Aptitude tests 

6 

1 

Oral interviews 

10 

12 

Experience qualifications 

10 

8 

Educational qualifications 

8 

4 

Medical examinations 

8 

2 

Physical examinations 

5 

1 

Recommendations 

6 

8 

Application blanks 

11 

4 

•Number of states 

11 

25 


Two types of studies have been of particular impor- 
tance to the utilization of mental tests in personnel selec- 
tion. The first is the study of intelligence levels of vari- 


* See Chapters XI, XIV, and XIX. 



The Uses of Mental Tests 


127 


ous occupations. It seems quite obvious that if we are 
to utilize mental tests for selecting employees we should 
know what levels of intelligence are suitable for the 
various occupations. One of the most comprehensive 
studies of the intelligence level of men in various occu- 
pations is that made from the results of the Army Intel- 
ligence Tests. From the occupations of the men who 
were drafted in the Army, it was possible to make a dis- 
tribution of the Army intelligence ratings and to com- 
pute the median scores for a great many occupations. 
Fryer later worked over the data afforded by the Army 
study, amplifying it in some instances by his own testing, 
and constructed a table showing the average scores and 
range of the middle 50 per cent of scores for a large 
number of occupations. Table X is quoted from Fryer’s 
study. Such studies as these are very valuable, but they 
do not answer all the questions that we need to know 
about intelligence levels in occupations. Most of the 
studies show only the situation as it exists at the present 
time. The results are obtained by measuring the intel- 
ligence of persons on the job at the time of the study. 
They do not take into account the fact that there may be 
many individuals on the job who are too intelligent for 
the work, and many who are too low in ability for satis- 
factory work. At best, such studies can be only rough 
guides for setting up critical points below which or above 
which we would not desire to select employees. 

The second type of study of immense importance to 
the utilization of mental tests in the selection of em- 
ployees is the study of degree of relationship between 
mental ability and success on the job. Most of these 
studies have been done in terms of correlations between 
the intelligence test scores of employees who are selected 
to fill positions, and some measurement of their success on 



Table X 

INTELLIGENCE LIMITS FOR VARIOUS OCCUPATIONS® 


Army Alpha 
Score 

Range of 
Middle 

60% of 


Average 

Scores 

Occupation 

161 

110-183 

. . .Engineer (civil and mechanical) 

152 

124r-185. ... 

. . . Clergyman 

137 

103-155 .... 

Accountant 

127 

107-164 .... 

. . . Physician 

122 

97-148 

. . Teacher (public schools) 

119 

94-139 

. . Chemist 

114 

84-139 

. . .Draftsman 

111 

99-163 .. . 

. . . Y.M.CA. Secretary 

no 

80-128 ... 

. . Dentist 

109 

81-137. ... 

. . Executive (minor) 

103 

73-124 .... 

Stenographer and typist 

101 

77-127. . . . 

. . . Bookkeeper 

99 

78-126 

, . .Nurse 

96 

74-121 

. . Clerk (office) 

91 

69-115 

. . Clerk (railroad) 

86 

59-107 ... 

. . Photographer 

85 

57-110 

. . Telegrapher and radio operator 

83 

64-106 .... 

. . Conductor (railroad) 

82 

57-108 .... 

. . Musician (band) 

81 

59-106 . . 

Artist (sign letterer) 

81 

60-106 . . . . 

. . Clerk (postal) 

81 

57-109 

. . . Electrician 

80 

62-114 . . . . 

. . Foreman (construction) 

80 

56-105 . . . . 

...Clerk (stock) 

78 

54-102 .... 

. . Clerk (receiving and shipping) 

78 

61-106 .... 

. . . Druggist 

77 

59-107 .... 

. . Foreman (factory) 

75 

56-105 . . . . 

. . .Graphotype operator 

74 

53-91 

...Engineer (locomotive) 

72 

54-99 

. . Farrier 

70 

46-95 

. . .Telephone operator 

70 

44-94 

. . Stock checker 

69 

49-93 

...Carpenter (ship) 

69 

48-94 

. . Handyman (general mechanic) 

69 

46-90 

...Policeman and detective 

68 

51-97 

. . . Auto assembler 

68 

47-89 

...Engineman (marine) 

68 

42-86 . . . 

. . . Riveter (hand) 

67 

50-92 

. . .Toolmaker 

66 

45-92 

...Auto engine mechanic 

66 

45-91 ... . 

. . .Laundryman 

•From Fryer, D., ‘‘Occupational Intelligence Standards,” by permis- 
sion of School and Society, 


128 




Table X (Continued') 

INTELLIGENCE LIMITS 

Range of 

Army Alpha Middle 

Score 60% of 

FOR VARIOUS OCCUPATIONS 

Average 

Scores 

Occupation 

66 

49-86 

Gunsmith 

66 

44-88 . 

Plumber 

66 

44-88 

. Pipefitter 

65 

44-91 

Lathe hand (production) 

65 

43-91 .... 

Auto mechanic (general) 

65 

43-91 .... 

Chauffeur 

65 

42-89 

Tailor 

65 

44-88 . 

Carpenter (bridge) 

64 

43-88 . .. 

Lineman 

63 

40-89 

Machinist (general) 

63 

46-88 . . . 

Motorcyclist 

63 

41-86 . . . 

Brakcman (railroad) 

62 

31-94 

Actor (vaudeville) 

61 

40-85 

Butcher 

61 

44-84 .... 

Fireman (locomotive) 

61 

39-82 ... 

Blacksmith (general) 

60 

38-94 .... 

.Shop mechanic (railroad) 

60 

36-93 . . 

Printer 

60 

40-84 

Carpenter (general) 

59 

40-87 . .. 

Baker 

59 

39-83 

Mine drill runner 

59 

38-81 

Painter 

58 

37-85 

Concrete worker 

58 

40-83 

Farmer 

58 

37-83 

Auto truck chauffeur 

58 

37-82 

Bricklayer 

57 

41-81 . .. 

Caterer 

57 

39-71 

Horse trainer 

56 

38-76 

. Cobbler 

55 

35-81 

Engineman (stationary) 

55 

34r-78 . . . 

Barber 

55 

35-77 

Horse hostler 

52 

38-96 .... 

Sales-clerk 

52 

33-74 

Horscshoer 

51 

31-79 . . . 

Storekeeper (factory) 

51 

25-77 .... 

Aeroplane worker 

51 

31-74 . . . 

Boilermaker 

50 

33-75 .... 

. Rigger 

50 

30-72 . . 

. Teamster 

49 

40-71 

..Miner (general) 

48 

21-89 

. .Station agent (general) 

40 

19-67 

. Hospital attendant 

40 

19-60 

, Mason 

35 

18-62 

. . Lumberman 

129 



180 The Uses of Mental Tests 

Table X (Concluded) 

INTELLIGENCE LIMITS FOR VARIOUS OCCUPATIONS 


Army Alpha 

Range of 
Middle 


Score 

60% of 


Average 

Scores 

Occupation 

36 

19-67 

. . .Shoemaker 

32 

16--59 

. . .Sailor 

31 

20-62 

. . Structural steel worker 

31 

19-60. ... 

. . .Canvas worker 

30 

ld-41 

. . . Leather worker 

27 

19-63 

...Fireman (stationary) 

27 

17-67 

. . Cook 

26 

18-60 

. . .Textile worker 

22 

16-46 

. . Sheet metal worker 

21 

13-47 

...Laborer (construction) 

20 

15-51 

. . .Fisherman 


the job. A great many of these studies have been made 
on clerical and stenographic workers. Results reported 
by Bills ^ seem typical of these studies. She found the 
correlation between intelligence and job success with 133 
clerical workers to be -\-.22. She noted that those mak- 
ing the lowest scores tended to leave and that those 
making the highest scores tended to go into higher types 
of jobs. The same investigator studied intelligence in 
relation to selection of stenographers and typists.® She 
expresses the opinion that the general intelligence test 
is the most eflScient among a number of tests which she 
used. In her study, those individuals who were working 
as secretaries attained an average score on her test of 
144; stenographers rated as “good,” 110; stenographers 
rated as “getting along,” 65; and stenographers failing 
in their work, 63. 

^ Bills, M. A., "Relation of Mental Alertness Test Score to Positions 
and Permanence in Company," Journal of Applied Psychology, Vol. 
VII, pp. 164-156. 

® Bills, M. A., "Methods for the Selection of Comptometer Operators 
and Stenographers," Journal of Applied Psychology, Vol. V, pp. 276- 



The Uses of Mental Tests 


181 


Hubbard reports typical studies of relationship be- 
tween intelligence tests and success in certain jobs in the 
public service. He studied employees in the New Jersey 
Civil Service jurisdiction holding jobs as prison and re- 
formatory ofl&cers, state patrolmen, firemen and bank 
examiners. He reports the, correlations between success 


Table XI 

VALIDITY OF VARIOUS EMPLOYMENT METHODS 


Validity 

Recruiting Method Coefficient 

Prison and Reformatory Officer: 

Final Average (Exclusiv^e of Veterans* Preference) 47 ± .06 

Medical and Physical Test .20 ± .07 

Oral Interview j24 ± .07 

Experience and Training .05 ± .07 

Intelligence Test .34 ± .07 

Objective Duties Test .51 ± .06 

Patrolman : 

Final Average (Exclusive of Veterans* Preference) 51 ± .07 

Medical and Physical Test ^25 ± .09 

Experience and Training .09 ± .09 

Intelligence Test . .50 ± .07 

Objective Duties Test 49 ± .07 

Fireman : 

Final Average (Exclusive of Veterans’ Preference) 44 ± .08 

Medical and Physical Test 15 ± .09 

Experience and Training 17 ± .09 

Intelligence Test .31 ± .09 

Objective Duties Test 33 ± .08 

Bank Examiner: 

Final Average (Exclusive of Veterans* Preference) 38 ± .09 

Oral Interview 32 ± .13 

Experience and Training 36 .12 

Free Answer Test 36 ± .12 

Intelligence Test (used for only one grade) 38 it .12 

Objective Duties Test 39 ± .09 


on the job for these employees and various selecting 
methods as shown in Table XI. It will be noticed that 
in most of these instances the best single method of se- 
lecting employees is either by the intelligence test or by 



182 


The Uses of Mental Tests 


the objective duties test. In one of these four types of 
positions, the mental test alone gives almost as high re> 
lationship with success on the job as is given by con- 
sideration of the average of all the factors studied. 

Under the guidance of O’Rourke, the United States 
Civil Service Commission haa developed a series of men- 
tal (general adaptability) tests, and O’Rourke ® has re- 
ported satisfactory use of these as part of the method of 
selecting applicants for a considerable number of federal 
government jobs. 


Table XII 

THE PROPORTION OF SUCCESSES AND FAILURES AT EACH 
MENTAL AGE LEVEL FOR 413 PACKING JOBS 


Mental Age 

Total Number 
o] Cases at 
Mental Age 

Number 
Successes at 
Mental Age 

Number 
Failures at 
Mental Age 

5 to &-11 ... 

1 

1 

0 

6 to 6-11 .. . 

5 

4 

1 

7 to 7-11 . . 

13 

13 

0 

8 to 8-11 .. . 

48 

46 

2 

9 to 9-11 .. . 

67 

65 

2 

10 to 10-11 . . 

82 

78 

4 

11 to 11-11 . . 

58 

57 

1 

12 to 12-11 . . 

53 

52 

1 

13 to 13-11 . . 

.... 42 

42 

0 

14 to 14-11 

.... 20 

19 

1 

16 to 16-11 . . . 

9 

9 

0 

16 to 16-11 . . . 

11 

11 

0 

17 to 17-11 . . . 

3 

3 

0 

18 to 18-11 . . 

1 

1 

0 


It is only fair to point out that despite such studies as 
we have just mentioned, intelligence is not found to be 
an important factor and mental tests are not found to 
be even moderately related to success in some jobs. This 
is particularly true of various low-grade factory jobs. A 

• See Annual Reports by L. J. O’Rourke, Director, Research Division, 
United States Civil Service Commission, to present. 



The Uses of Mental Tests 


183 


typical study of this sort is one by Unger and Burr.“ 
Table XII shows the proportion of successes and failures 
among 413 persons doing packing jobs in an industry. 
Success is about as likely at the low mental ages as at the 
high ones. 

It seems that we may summarize the values of mental 
tests in the selection of employees somewhat as follows: 
For most jobs at the professional, technical, or office 
level, intelligence, or general mental ability, seems to be 
a significant factor. It may be significant in the sense 
that a minimum level is required and that success above 
this is dependent on other factors; it may be significant 
in that degree of intelligence is directly proportional to 
degree of success all along the line; or it may be signifi- 
cant because of its relation to promotional possibilities 
in the organization. For most jobs of the routine factory 
type, there seems to be relatively little relationship be- 
tween mental level and success. If there is a minimum 
level below which success is impossible, it is so low that 
the problem arises too seldom to justify mental measure- 
ments of all applicants. It should be emphasized that 
in personnel problems general mental tests should be 
supplemented by other measurements of human traits 
and capacities. Intelligence tests and intelligence test 
studies in industry have in many instances pointed the 
way to development of special vocational tests and apti- 
tude tests. 

IV. Uses of Mental Tests with Delinquents 
and Criminals 

Long before the development of any quantitative 
methods of measuring human traits, those who dealt 

Unger, E. W., and Burr, E. T., Minimum Mental Age Level of Ao 
complishment, University of the State of New York, Albany, 1931, p. 
108. 



184 


The Uses of Mental Tests 


with law-breakers were interested in the makeup of the 
criminal. Hence, with the advent of intelligence tests, 
criminals were among the early special groups studied. 
The studies have aimed at answering several questions 
about the criminal. Is he always, or commonly, men- 
tally deficient? Is the mental defective always a poten- 
tial criminal? Is there a relationship between type of 
crime and mental deficiency? What relationship exists 
between intelligence and possibilities of rehabilitation of 
the criminal? Not all the problems are yet fully 
answered, but the application of measures of intelligence 
has aided in the answering of some of them. 

The statistical results. Most of the studies that have 
been made are based upon the testing of convicted delin- 
quents or criminals in various types of institutions, such 
as reformatories, industrial schools, jails, prisons, peni- 
tentiaries. Most of the results have been expressed in 
terms of either the percentage of feeblemindedness found 
among the criminal groups or the average intelligence 
of the criminal group. Most often the tests applied 
have been either the Binet Test or the Army Intelligence 
Tests. The studies have shown widely different results. 
Percentages of feebleminded range from below 10 to 
over 90 per cent. Pintner lists 42 studies of feeblemind- 
edness among delinquent children and 13 of delinquent 
adults. The highest percentage of feeblemindedness (93 
per cent) was reported by Hill and Goddard in 1911. 
The lowest reported is 7 per cent, found by Miner in 
1918 and Healy in 1922, both studying juvenile delin- 
quents. Average percentages for all studies are about 
30 for both children and adults. For the more recent 
studies, which we may consider somewhat more reliable, 
averages are closer to 20. 



The Uses of Mental Tests 


185 


The extreme variations of the studies are probably to 
be accounted for by such factors as type of criminal 
tested, locality in which tests are made, type of mental 
test used, and representativeness of cases. The varia- 
tions make it difficult to arrive at definite conclusions, 
but we can be fairly certain that feeblemindedness is 
more prevalent among criminals in general than among 
the population at large. (Percentage among the popu- 
lation at large is near one per cent). 

There have been several studies of the relation between 
type of crime and intelligence. Table XIII is quoted. 

Table XIII 

PERCENTAGE OF VARIOUS CRIMINALS 
WHO ARE MENTALLY INFERIOR 


Per Cent 
Inferior 
{Below C, as 
Used in Army 

Crime ' Tests) 

Fraud 22.0 

Force 30.6 

Statutory 31.0 

Thievery 31 S 

Physical injury 36.9 

Dereliction 43.1 

Sex 47.6 


from a study by Murchison to show the types of dif- 
ferences usually found. Almost every type of crime 
seems to involve all degrees of intelligence. There is a 
tendency, however, for crimes against person and crimes 
of brutality to be committed by individuals of lower in- 
telligence than crimes against property involve. 

Murchison, C., Criminal Intelligence, Clark University, Worcester, 
Mass., 1926. 



186 


The Uses of Mental Tests 


V. The Use of Mental Tests in Various Theoretical 
and Research Problems 

There are many psychological problems of theoretical 
importance or of only indirect practical bearing upon the 
problems of school, industry, or society that have been 
furthered or indeed made possible by the development 
of mental measurements. We mention a few of these. 

1. Studies of mental growth. Two questions about 
mental growth have long interested psychologists. At 
what rate does mental growth take place at various points 
throughout the mental growth span — ^is the rate uniform 
or does it vary at different ages? At what age is mental 
maturity reached and growth stopped? Before the ad- 
vent of mental tests, no accurate method was available 
for the study of these questions. Now a wealth of mate- 
rial has been collected in the form of mental test records 
for various ages. These records have generally shown 
increasing mentality up to an age somewhere near 
twenty, or somewhat above; furthermore, they have 
generally shown a decreasing rate of growth as age pro- 
gresses. Thorndike has pictured the general nature of 
this growth as in Fig. 6.’* Yet the questions about men- 
tal growth are far from settled at the present time. 
Pintner aptly points out the difficulties: 

We must note, here, the great difficulty of arriving 
at any definite conclusion from such studies. If we 
take the average scores at each age as indicative of 
mental growth, we make two assumptions: first, that 
the score units are equal at all levels of the test, which 
is very doubtful; second, that the selection of children 


** Thorndike, E. L., and others, The Measurement of Intelligence, 
Bureau of Publications, Teachers College, Columbia University, New 
York, 1027, p. 466. 



The Uses of Mental Tests 


137 


at each age is the same, which is also very doubtful. 

The real curve of mental growth will not be obtained 
until we have a measure of intelligence with equal units 
at all levels and until we have repeated annual meas- 
ures of a great number of cases." 

2. Studies of the constancy of relative mental abiUty. 
Is a person’s relative mental ability the same throughout^, 
life? Is the child of average I. Q. also the man of aver- 



Age 

Fig. 6.— Mental Growth Curve. 


age intelligence, or may relative ability change? These 
questions are of vital importance to our utilization of 
mental test results. If relative ability is likely to change, 
then our predictions on the basis of mental testing can 
be of only immediate value ; remote or long-time predic- 
tions would be subject to great inaccuracies. Answers 
to the questions are contained in retests of individuals 
made considerable time after original tests. Generally 

Pintner, R., Intelligence Testing, Henry Holt & Co., New York, 
1931, p. 81. 




188 


The Uses of Mental Tests 


such studies have shown surprisingly constant relative 
abilities of the testees. I. Q/s (one of our commonest 
ways of expressing relative ability) remain closely the 
same from year to year. Typical studies concerning the 
constancy of I. Q.’s are shown in Table XIV, a summary 
taken from Freeman.^^ 

Table XIV 

MEASURES OP THE VARIATIONS IN THE I. Q. ON 
RETESTING AS FOUND IN SEVERAL TYPICAL STUDIES 




Author 

No. Cases 

Percentage 
Differing 10 Poii 
or More 

Limits of 
Middle 

60 Per cent 

Average 

Change 

Coefficient of 

Correlation 

between 

Two Tests 

Terman 

436 

.16 

—93 to +6.7 

4.5 

.93 

Rugg & Colloton. 

137 

.12 

—2.3 to +5.6 

4.7 

M 




r — 2 to “["4 1 



Garrison 

468 

.085 j 

1 —3 to "|"4 j 

[ 5.4 

.88 




1^ — 3 to +6 J 



Rugg, L. S 

114 


—13 to +1.9 

3.1 

.96 


3. Studies of heredity. Since Galton’s classic study 
of the inheritance of mental traits, described in his book 
Hereditary Genius, psychologists have pursued the ques- 
tion of the influence of heredity in determining mental 
ability. Most of the recent studies have made use of 
the correlation method, and of some type of mental test 
for indicating amount of intelligence. If the coefficient 
of correlation between mental test scores approaches 
zero, there is no resemblance in ability among the indi- 


Freeman, Frank N., Mental TesU, Houghton Mifflin Co., New 
York, 1926, p. 346. 



The Uses of Mental Tests 


189 


viduals compared. Randomly selected pairs of unrelated 
individuals show zero correlations for intelligence test 
scores, since there is no factor making for a resemblance 
among such individuals. Among pairs of related indi- 
viduals, higher correlations are found, and the closer the 
degree of blood relationship between the pairs studied, 
generally the higher is the correlation coefficient. Such 
studies are not absolute proof of the influence of heredity. 

But since environmental factors in most of the studies 
have been very similar, the assumption that the great 
resemblance in mentality of closely related individuals, 
as compared with less closely related or unrelated ones, 
is due largely to heredity does not seem to be unreason- 
able. Studies such as made by Merriam, Hildreth, Hol- 
zinger, and Thorndike show correlations of mental test 
records for identical twins of about .90, for fraternal 
twins of .70, and for siblings of about .50. Contrast these 
with correlations of zero for individuals paired at random. 

4. Other studies using mental tests. Many other 
types of studies make use of mental tests. The scope of 
this book does not permit a discussion of them all. A 
few may be mentioned as suggestions for those who may 
wish to pursue them further. They include racial com- 
parisons on mental tests, studies of relation between 
mentality and social status, sex comparisons in mental 
ability, studies of relationships between mental meas- 
urements and various physical and personality measure- 
ments, and many studies aimed at the improvement of 
mental tests themselves. 




Pabt III 

MEASUREMENT OF APTITUDES 




CHAPTER VIII 


The Measurement of Special Talents 

T he instruments for measurement in the field of 
special talent are few, owing mostly to the nature 
of the things to be measured and to the infrequency of 
occurrence in the same individual of a sufficient amount 
of the talent or interest in the talent, on the one hand, 
and of psychological technique on the other, to make 
him an efficient investigator. Seashore, a pioneer in 
this field, working on the measurement of musical talent, 
stated in this way the problem which he faced: 

The scientific study of the artistic mind is a some- 
what baffling undertaking. There are no substantial 
precedents; the available scientific data are extremely 
meager; by nature the artist himself is but little inter- 
ested in the process of his mental dissection; and, after 
all, the varieties of artistic minds are legion. But the 
time is ripe for a vigorous application of the technique 
of psychological inventory to practical affairs, and the 
discovery and fostering of human talents is indeed 
both practical and practicable. The stress of war 
forced our army to adopt psychological methods for the 
selection and rating of the human energies of men for 
assignment to service and for promotion. When the 
best results are demanded in any occupation, haphazard 
procedure must give way to procedure on the basis of 
ascertained facts. When Music shall come to her own 
she will come to the musically gifted: to that end mu- 
sical talent must be revealed and encouraged.^ 

1 Seashore, C. E., The Psychology of Mmical Talent, Silver, Burdett 
and Co., New York, 1919. 


143 



144 Measurement of Special Talents 


I. The Measurement of Musical Aptitude 

As an example of measurement in the field of special 
talents, we shall examine Seashore’s measurement of 
musical ability. The first mention of such standardized 
tests as this was made by Seashore in 1901. Years of 
research followed, during which the plans for these tests 
were experimentally developed in the psychological labo- 
ratory. As a final result, a battery of standardized musi- 
cal tests was made available in 1919. 

The problem of measurement of musical talent con- 
sisted, first, of an analysis of the elements making up 
musical ability; then, the discovery of objective ways of 
measuring them; and, finally, the standardization of the 
means of measurement. Seashore states that an inven- 
tory of the “musical mind” includes elements that have 
to do almost wholly with the reception of various at- 
tributes of sound: pitch, intensity, duration, and exten- 
sity — a problem of musical sensitivity; and elements 
that depend upon more intellectual qualities — a problem 
of musical appreciation, memory, and imagination. The 
first of these two types of elements is the more funda- 
mental in the measurement of native capacity. The 
other type is dependent upon the first; is more subject to 
modification; is more difiScult of analysis into its separate 
components; and is more mixed with other intellectual 
traits, nq more related to musical performance than to 
other things the individual does. Five of the six ele- 
ments measured in the Seashore tests are of the sensitiv- 
ity group; one has to do with musical memory. 

1. Description of the test. The six parts of the Sea- 
shore test measure the following: (1) sense of pitch; (2) 
sense of intensity; (3) sense of time; (4) sense of 
rhythm; (5) sense of consonance; (6) musical memory. 



Measurement of Special Talents 145 

These six parts of the test are described by Seashore as 
follows: 

The sense of pitch. The sense of pitch is involved, 
not only in the hearing of melody and harmony, but 
also in the hearing of tone character in many complex 
forms. Pitch is the raw material of music. The func- 
tion of the higher capacities, such as memory, imagina- 
tion, and feeling, or playing and singing, is limited by 
the degree of sensitiveness to pitch. This becomes sig- 
nificant when we find, for example, that, according to 
actual measurement, one person may be two hundred 
times as sensitive to pitch as another of equal age, so- 
cial standing, and general intelligence. 

The sense of intensity. Then, we have the sense of 
intensity, which represents the capacity for appreciation 
of differences in strength of sound. This is basic for 
the hearing of musical expression and the appreciation 
of touch, and for modulation in intensity or loudness 
and volume. 

The sense of time. The third elemental capacity is 
the sense of time. This is basic for all perception of 
rhythm and for rhythmic action. A limitation in this 
capacity for hearing time sets a corresponding limita- 
tion upon feeling, thought, and action. 

The sense of rhythm. The sense of rhythm rests 
upon the sense of time, the sense of intensity, and men- 
tal imagery, but it requires in addition a number of 
affective and motor qualifications; thus a person may 
have a keen sense of time and intensity and still not 
have a pronounced sense of rhythm. 

The sense of consonance. The sense of consonance is 
the simplest ^rm of musical hearing which underlies 
the combination of tones, either simultaneous or suc- 
cessive, as in melody or harmony. This rests primarily 
upon a sense of pitch, but involves higher elements so 
that a person may have a keen sense of pitch and yet 
not be effective in the sense of consonance. 

Musical memory. The need of a musical memory is 
self-evident. It is not merely a matter of recalling se- 



146 Measurement of Special Talents 

lections. Memory enters intricately into all stages of 
hearing, feeling, and rendering of music. The learning 
process is one special aspect of memory. Each indi- 
vidual has a certain personal equation for capacity in 
rate and excellence of learning, and each of us has some 
apt preference for one kind of material or another. For 
a given activity, such as singing, sight reading, piano 
exercises, this may be expressed in the form of what is 
technically called a learning curve. 

2. How the six qualities are measured. The six parts 
of the Seashore test are recorded on phonograph records 
and the test is taken by having the subjects listen to the 
records and answer questions which would indicate their 
ability to discriminate differences in pitch, intensity, 
time, and so forth. 

The sense of pitch, in terms of pitch discrimination, is 
measured by means of a series of tuning forks used with 
resonators. These forks are tuned in a differential series 
in which the standard is a fork with a vibration of 435. 
In the whole series there are ten forks which vary from 
the standard by different degrees of pitch. In the test 
the standard fork and one of the other forks are sounded 
in quick succession in front of a resonator. The problem 
of the listener is to tell whether the second tone is higher 
or lower than the first, and in taking the test he simply 
records “H” or “L” for each combination as he listens to 
it from the phonograph record. 

The sense of intensity is measured in terms of the least 
perceptible difference in intensity or loudness from the 
standard sound. In producing the sounds for the musical 
test an audiometer is used. Two sounds are made for 
each trial, the differences in loudness between the two 
being of varying degrees. The subject is required to 
state for each set whether the second sound is weaker or 
stronger than the first. 

The sense of time is measured as perceptions of dif- 



Measurement of Special Talents 


147 


ferences in duration of time intervals. The arrangement 
for recording this test on the phonograph record is one 
of a synchronous motor with a time-sense attachment. 
The attachment is such that clicks are produced by a 
projecting lever. These clicks, which occur at varying 
time intervals, are transmitted into a telephone receiver. 
From this point they are transferred to the phonograph 
record. In taking the test, the subject must indicate 
which time interval is the longest. For example, if 
there are three clicks, he indicates which interval is 
longer, the first or the second. If there are four clicks, 
he indicates the first, second or third. 

The rhythm test is somewhat similar to the test for 
sense of time. The same instrument and the same 
method are employed as in the time-sense test except that 
the intervals are arranged in rhythmic divisions, long and 
short; for example, two quarter-notes and a half-note. 
Four such measures may be given as the standard, and 
in the fifth, the half-note may be made too long or too 
short, the test being to determine the accuracy in the 
perception of the long intervals. 

The sense of consonance is the capacity for hearing dif- 
ferences in consonance and dissonance. It forms the 
basis of the ability to judge aesthetic effect in combina- 
tions of tones. The Seashore test for this quality is a 
test of appreciation of consonance and dissonance with 
reference to two-clangs, a two-clang being consonant 
when the two tones tend to blend and to produce a rela- 
tively smooth and pure clang; and, conversely, the two- 
clang being dissonant when the two tones do not blend 
and do not produce a smooth sound. The records re- 
corded on the phonograph consist of sets of two-clang 
sounds, and for each set the subject must indicate which 
one of the two sounds is the more consonant. 

The test for musical memory consists of a span of from 



148 Measurement of Special Talents 

three to six musical notes which is played a second time 
with one of the notes changed. The subject taking the 
test must indicate which of the notes is changed in the 
second playing. The use of varying lengths of spans of 
notes gives an opportunity for distinguishing different de- 
grees of ability to remember the notes played. 

3. Studies of the Seashore test, (a) Reliability. 
There has been considerable disagreement as to the reli- 
ability of the Seashore test. Probably the most extensive 
study of reliability reported is that of Larson.* This 
study was made in Lincoln, Nebraska, and Iowa City, 
Iowa. Altogether between 1,100 and 1,200 individuals 
were tested. These were distributed among fifth-, sixth-, 
seventh-, and eighth-grade school groups and an adult 
group of somewhat over 400. All the groups tested were 
composed of unselected individuals so far as musical 
ability and musical training were concerned. The tests 
were given under well-controlled conditions; all possible 
precautions were taken to eliminate the influence of ex- 
traneous factors. A preliminary trial or fore-exercise 
was given to insure understanding of the directions. 
Tests which clearly indicated that the directions were 
misunderstood were discarded. The tests were given to 
the students by the group method, one test being given 
daily. The same test was given a Second time at the 
same sitting after a short period of relaxation. 

The reliability coefficients of each test for the same 
sitting were found by correlating scores on the first test 
with those on the second test. This was done separately 
for all six measures for fifth-, sixth-, seventh-, and eighth- 
grade and adult groups. The Pearson product-moment 


* Larson, Ruth Crewdson, Studies on Seashore^s ^^Measures of Musical 
Talent/* University of Iowa Studies, Vol. II, No. 6, 1927. 



Measurement of Special Talents 149 

method of correlation was used in this study. The relia- 
bility coeflBcients are summarized in Table XV. 

Table XV 

RELIABILITY COEFFICIENTS FOR SEASHORE TESTS 


Test 

Average 
Correlation 
for Adults 

Average Correlation 
for 6th, 6 th, 7 th, 

& Sth Grades 

Pitch 

^45 

.85 

Intensity 

.765 

.81 

Time 

705 

.71 

Consonance 

725 

.52 

Tonal Memory . 

.935 

.87 

Rhythm 

.700 

.51 


The final results showed that the pitch and memory 
tests are highest in reliability. Intensity ranks next, 
while consonance and rhythm show the lowest reliability 
in this battery of tests. The author of the study states 
that “the pitch and tonal memory tests are shown to be 
very satisfactory for use at all levels; the intensity and 
time tests are adequate for group measurements; and the 
consonance and rhythm measures should be reserved 
particularly for group averages and school surveys.” 

In discussing the reliability of the battery of Seashore 
tests, Larson attempts to account for the differences 
found in the results of different investigators. The fol- 
lowing two paragraphs are quoted from the study. 

The differences found in the results of various in- 
vestigators might be attributed to numerous uncon- 
trolled factors. Some differences might be expected, 
since results have been offered in connection with studies 
made primarily for purposes other than determining the 
reliability coefficients of the tests. Factors which might 
influence reliability are motivation, controlled room 
conditions, reliability of the phonograph motor, num- 
ber of preliminary trials before the test is actually 



150 Measurement of Special Talents 

given, adaptation to ages, the number of tests given at 
one sitting, dishonesty of students in recording, and 
interpretation. The writer observed that the size of 
the group tested made a difference j the larger the group 
the more difficult it was to keep controlled conditions. 

Another factor that might affect reliability is the skill 
of the experimenter. Often in various aspects of re- 
search attempts are made to use the Seashore tests for 
a certain study without adequate preparation in the 
psychology of music or without an adequate technique 
acquired from a routine of music testing. Contrary 
to a common impression, experience is necessary to get 
the best results with the Seashore tests. From the 
“Foreword” of Stanton’s Prognosis of Musical Achieve- 
ment, Dr. Howard Hanson, director of the Eastman 
School of Music, in evaluating the eight year testing 
program at that school states, “As a practical musician 
I have been convinced of their (the Seashore tests’) effi- 
cacy. I should wish, however, to add my belief that 
such testing is only of value when undertaken by 
thoroughly trained psychologists under conditions where 
control of experimentation is absolute. The undertak- 
ing of such a testing program by inexperienced and un- 
dertrained persons could only be a calamity.” 

(b) Validity. The study of the validity of the Sea- 
shore tests has presented a difficult problem because it 
necessitates having available, for comparison with the 
test scores, some reliable criterion of musical ability. As 
is obvious in a field where measurement has not been at- 
tempted before, it is practically impossible to obtain such 
a criterion. The test results have been checked with 
various indications of musical ability by different inves- 
tigators. Stanton made a study of music students at the 
Eastman School of Music over a period of four years, in 
which an attempt was made to compare the musical test 
scores with the length of time the students stayed in 
school. This investigator reports that “of 242 cases. 



Measurement of Special Talents 151 

47% of the cases which tested below average continued 
in their work for a year; over one half of the low talent 
discontinued. 84% of the talent which was rated as 
above the average remained one year,” If we can as- 
sume that lack of ability to pursue musical training is a 
major cause of dropping out of the school, these figures 
indicate some validity of the test results. 

Larson attempted to measure the validity of the test 
by comparing the test records for groups of six levels of 
musical ability; namely, first-class musicians, semi-pro- 
fessional musicians, advanced amateurs, beginning stu- 
dents, adult non-musicians, and eighth-grade non-musi- 
cians. In general this investigator found an increasing 
difference between the first-class musicians and each 
group of lower rank. The differences between the last 
three groups, the adult non-musicians, beginning music 
students, and eighth-grade non-musicians, are not large. 
This is as would be expected. Two of the groups are 
unselected, and the beginning music students rank only 
slightly above average in musical ability. Comparisons 
can best be made by knowing the differences between the 
first-class musicians, the semi-professional musicians, the 
amateurs, and the three unselected groups. 

Larson also studied validity of the test by comparing 
test scores with school grades in music and with music 
teachers’ estimates of students’ ability. These studies 
were based upon students in the fifth, sixth, seventh, and 
eighth grades. The correlations which were obtained are 
summarized in Table XVI. 

(c) Norms. Norms are given for the six parts of the 
Seashore test for three levels of ability; namely, for fifth 
grade, eighth grade, and adult. One might wonder why 
these three levels have been selected for the statement of 
norms. Seashore states that these levels are selected be- 



152 


Measurement of Special Talents 

cause of their relationship to vocational guidance in 
music, this being particularly true of the fifth- and 
eighth-grade levels. He thinks that the fifth grade rep- 
resents a good stage for a survey of the musical ability 
of pupils in school, for at this stage they are able to take 
a responsible attitude and it is early enough to start them 

Table XVI 

VALIDITY COEFFICIENTS FOR SEASHORE TESTS 


Correlation Correlation with 

with Grades Teachers* Estimates 
Test in Mtisic of Musical Ability 

Pitch 32 .31 

Intensity 19 .11 

Time .21 .17 

Consonance .37 .34 

Tonal Memory .47 .46 

Rhythm 3XS .23 


in a musical education in case it has been neglected up to 
that time. All who have good musical talent can be 
measured reliably at that age if the test is carefully done. 
At the eighth grade, according to Seashore, the child 
faces another turning point. Here some children transfer 
to the high school and trade schools and enter upon a new 
adjustment of studies marked by the beginning of elective 
studies. Others leave school to work, and the avocation 
for life is frequently chosen in this pre-adolescent stage. 
Seashore thinks that for both of these classes of pupils, 
the claims of music, particularly as an avocation, should 
be presented in the most attractive form and with specific 
knowledge of the natural endowment of the pupil for 
music. 

4. Studies of the relationship between the musical 
aptitude test and various other factors. The Seashore 



Measurement of Special Talents 153 

test has been studied for its relationship to various other 
factors which may have some influence in determining 
musical ability. 

(а) Inheritance of musical traits. Stanton used the 
Seashore test in a study of six of the foremost musical 
families in America. She examined 85 individuals in all. 
On the basis of the test results she concluded that there 
is a tendency for the inheritance of musical traits. 

(б) Racial influences. Some half dozen studies have 
been made with the objective of comparing Seashore test 
records of whites and negroes. Most of these studies 
show no appreciable differences between the two groups 
in natural musical capacity. There is a study by John- 
son which shows a superiority of the negro in the sense 
of rhythm; a study by Lenoir shows a superiority of 
colored children in sense of time and sense of rhythm. 

(c) Training. A number of studies have been made 
on the effect of training upon the improvement of Sea- 
shore test scores. The results of these studies indicate in 
general that the capacities measured are relatively ele- 
mental in that they do not improve to any great extent 
with training. 

(d) Intelligence. Most of the studies that have been 
made on the relationship between Seashore test scores 
and intelligence show low positive correlations. Studies 
of this nature have been made by Fracker and Howard 
(university students), Hollingworth (children above 135 
I. Q.), and Larson (elementary school pupils). These 
studies seem to confirm the view of Seashore, that music 
tests are not in any significant way tests of intelligence. 

II. Measurement in the Field of Art 

More recently standard tests have been worked out 
in the field of art. Most of them measure art judgment 



154 


Measurement of Special Talents 

or artistic appreciation through the medium of responses 
to pictures. Studies have indicated that this type of 
artistic capacity is basic to artistic productions in the 
fields of painting, sculpture, and the like, and is fairly 
closely correlated with skill and technique in artistic per- 
formances; hence such tests as we are about to describe 
can be considered to have value in the educational and 
vocational guidance of talented individuals. Probably 
the most carefully devised of these tests are the Meier- 
Seashore Art Judgment Test and the McAdory Art Test. 

1. The Meier-Seashore Art Judgment Test. The 
principles underlying the construction of this test have 
been stated by the authors as follows: 

The underlying principle of the test is that aesthetic 
judgment, resting upon fine discrimination, feeling, and 
insight, is basic to success in art, whether it be sculp- 
ture, painting, etching, or some form of applied art. 
The possession of it in a high degree is what separates 
the master from the crowd, or the artist who produces 
effective art consistently from the one who seldom does. 
Aesthetic judgment is defined as the capacity for perceiv- 
ing quality in aesthetic situations relatively apart from 
formal training. It may be further characterized as 
the capacity to sense artistic arrangement and effective 
use of line, color, and mass in a composition ; to recog- 
nize good proportion in a vase or architectural design; 
or to feel the rhythm or movement of any type of art. 
Now, it needs only to be pointed out that, having this 
capacity, the artist will be accurately critical of his own 
effort and will sense, more frequently than one who has 
it in lesser degree, how to turn his efforts to advantage. 

He will have a definite feeling of when to add or omit 
details when their inclusion or omission may make or 
mar a partly completed composition. Research into 
the methods of great painters indicates that this capac- 
ity, functioning particularly in the matter of selecting 
the best sketch to develop from a number of prelimi- 



Measurement of Special Talents 155 

nary studies, was one of the most significant traits of 
the old masters. 

Talent in art is composed of a score or more of such 
related factors. In attempts to measure and evaluate 
these, some have been found to correlate slightly with 
ability to draw and apparent success in artistic pursuits ; 
others have shown practically no relationship. The po- 
sition taken here is that none of them is of such crucial 
importance or as indispensable to ultimate success as 
aesthetic sensitivity. The test measures the key-ca- 
pacity, regarding it as the most trustworthy and signifi- 
cant index to talent and to probable success in an art 
career. Almost any student of art with a reasonable 
ability to draw may have some promise of success ; yet 
it would be unfortunate for such a person, handicapped 
with a low degree of aesthetic judgment, to struggle 
along in competition with others better equipped. At 
least it would profit him to take fresh stock of his gen- 
eral capacities and abilities in other lines.® 

The test material consists of 125 pairs of pictures. 
The two pictures in each pair are nearly alike. They 
differ only in one respect, and the test blank indicates 
what that is in each case. The subject being tested re- 
cords which picture in each pair is the better by encircl- 
ing L for left picture or R for right picture. Pictures 9, 
10, and 11 are reproduced in Fig. 7.* 

Norms and distributions of scores on the test are pub- 
lished for junior and senior high school students, for art 
students, and for art teachers. The authors claim that 
validity of the test is indicated by (1) method of selec- 
tion of the items; (2) differentiation in distributions of 
scores made by general groups as compared with groups 

* Meier, Norman C., and Seashore, Carl E., The Meier-Seashore Art 
Judgment Test. Examiner’s Manual. Bureau of Educational Research 
and Service, State University of Iowa, Iowa City, 1930. 

* Reproduced by permission of Bureau of Educational Research and 
Service, State University of Iowa, Iowa City. 




11 


Pig. 7. — Samples from the Meier-Seashore Art Judgment Test. 

L R 9 Location of the band. 

L R 10 The arrangement of logs. 

L R 11 Inclination of twig supporting bird. 

156 







Measurement of Special Talents 157 

of known artistic ability; (3) independence of scores of 
specific art training, shown by occurrence of high scores 
in general untrained groups; and (4) lack of significant 
correlation with general intelligence and other general 
ability measures. The selection of the items for the test 
was based upon several considerations. Many of the 
items are adaptations of “time-tested” pieces of art that 
have survived centuries of criticism. Final selection of 
the items was determined by, first, a favorable critical re- 
action on the part of 25 experts, and second, a 60- to 90- 
per-cent preference for the same item of the pair on the 
part of 1,081 subjects. 

2. The McAdory Art Test. This test, worked out 
and standardized by Margaret McAdory Siceloff, consists 
of 72 plates, each presenting four illustrations which 
treat a single subject, or test item, in four different ways. 
The subjects of the plates include furniture and utensils, 
textiles and clothing, architecture, painting, and other 
graphic arts. Each plate calls for discrimination in one 
or more of the following art elements: shape and line- 
arrangement, massing of dark and light, color. Plates 5 
and 23 are seen in Figs. 8a and 8b.® Those taking the test 
are required to make a first choice, second choice, third 
choice, and fourth choice for each plate. The score is 
determined by the agreement of the testee’s choices with 
those set up by consensus of a large number of competent 
judges selected from among artists, art critics, art teach- 
ers, etc. 

According to the author’s reports, the test seems to 
possess satisfactory reliability and validity. Scores on 
the test show increase with age up to about eighteen, dif- 
ferentiation between those with and those without artistic 

^ Reproduced by permission of Bureau of Publications, Teachers Col* 
lege, Columbia University, New York. 




B 

168 


Pig. 8a.— Samples from 




the McAdory Art Test. 


159 







Measurement of Special Talents 161 

manifestations or training, and some advantage among 
females as compared with males of the same age and 
training. Some of these findings are indicated in Table 
XVII.® 


Table XVII 

NORMS ON THE McADORY ART TEST 


Based on Testing in New York City 

Male Female 

Zero ability, or score attainable by mere chance 

68 

68 

Ability of an average lO-yr.-old child . . . 

103 

114 

Ability of an average ll-yr.-old child 

110 

123 

Ability of an average 12-yr.-old child 

116 

132 

Ability of an average 13-yr.-old child 

122 

139 

Ability of an average 14-yr.-old child 

127 

145 

Ability of an average 15-yr.-old child 

132 

150 

Ability of an average 16-yr.-old child 

136 

154 

Ability of an average 17-yr.-old child 

140 

156 

Ability of an average 18-yr.-old child 

143 

158 

Ability of an average adult 

146 

160 

Ability of first-year student in art school 

173 

179 

Ability of college graduate engaged in teaching 
Ability exceeded by only 19^ or fewer of adult 

162 

180 

population of the sex in question 

202 

220 


^^Siceloff, M., McAdory, and Woodyard, Ella, Validity and Standardi- 
zation of the McAdory Art Test, Bureau of Publications, Teachers 
College, Columbia University, 1933, pp. 23 and 24. 



CHAPTER IX 


The Measurement of Mechanical Aptitude 

W HAT are usually designated as mechanical apti- 
tude tests grew out of the limitations of ability 
testing by the commonly used verbal intelligence tests. 
They were among the earliest of the special tests studied, 
Stenquist having begun work on the devising of his 
mechanical aptitude tests as early as 1915. 

When work was started on measurement of mechanical 
aptitude, abstract intelligence was about the only ability 
which had been measured. Thorndike had already sug- 
gested a convenient division of general ability into ab- 
stract intelligence, mechanical intelligence, and social in- 
telligence. And observation and experience without the 
aid of objective measuring devices had suggested that the 
second of these was suflBciently different from, and suf- 
ficiently uncorrelated in amount possessed with, abstract 
intelligence to make its measurement worth while. Sten- 
quist was particularly interested in the possibilities which 
such measurements might open to the school children 
who measured relatively low on abstract tests but 
possessed considerable ability of a more concrete sort. 
This view is reflected in the following quotation from 
Stenquist: ^ 

Of the relative importance of each of these two types 
of ability, readers must form their own conclusions. 
But it should be kept in mind that we are living in a 

1 Stenquist, J. L., Measurements of Mechanical Ability ^ Bureau of 
Publications, Teachers College, Columbia University, 1928, p. 89, 

162 



Measurement of Mechanical Aptitude 168 

world that is dominated on every hand by e\ery form 
of mechanical device and machine. Every moment of 
present-day life is influenced directly or indirectly by 
the products of mechanical skill and genius. Is it not 
important that ability in this field should be discovered 
and developed? Rather than merely to dismiss our 
apparently stupid pupils as low in what we now call 
general intelligence, and to relegate them to some con- 
venient class, might not our time profitably be spent in 
disclosing other kinds of intelligence of which they may 
be possessed? 

The question of “what knowledge is of most worth^^ 
will prol3ably never be finally answered to the satisfac- 
tion of all. But it seems certain that as life becomes 
more and more complex, the world^s tasks become more 
varied, and group inter-dependence increases, there is 
constant need for broader conceptions of what consti- 
tutes worth-while mental ability. We should recall that 
the history of the past century, as has often been said, 
could well be written in terms of the achievement of ap- 
plied science and applied mechanical genius. Inven- 
tions of hitherto undreamed of significance, which have 
revolutionized or at least profoundly influenced the life 
of every nation on the globe, have sprung from this field 
of knowledge. And wdiile the attempts to measure the 
mental abilities back of these forces, which are herein 
described, represent but crude beginnings, the impor- 
tance of the task is stoutly maintained. Indeed, to ex- 
plore, measure and adequately capitalize these capaci- 
ties seems at least as important as doing the same for 
the more abstract type of intelligence required in aca- 
demic school subjects. The discovery of special abili- 
ties has a two-fold significance and like the quality of 
mercy “is twice blessed”: It not only opens the door 
of new promise to pupils, many of whom have been 
labelled as failures, but in doing so, it leads toward 
further contributions to society. 

Tests of mechanical aptitude usually aim at meaesuring 
general aptitude in the management and manipulation 



164 Measurement of Mechanical Aptitude 

of things mechanical. They commonly imply a general 
knowledge of mechanical principles and usages, but do 
not presuppose any special trade skill. They draw upon 
inherited motor abilities and upon a certain amount of 
acquired familiarity with common things of a mechanical 
nature and upon mechanical interests. Applied to young 
people, high amounts of the traits measured by mechani- 
cal aptitude tests presumably represent potentialities of 
success in vocations and tasks requiring skill in the use 
of tools, or the understanding of and operation of ma- 
chinery. 


I. The Stenquist Mechanical Aptitude Tests 

Stenquist developed two types of mechanical tests 
which have been widely used in schools and in industrial 
and business organizations, mainly in problems of voca- 
tional guidance and selection. He developed two As- 
sembly Tests, known as Series I and Series II.® Each 
consists of ten common mechanical contrivances which 
have been taken apart and which must be assembled by 
the testee. The two series contain the following articles: 


SERIES I 


1. Cupboard catch. 

2. Chain. 

3. Mouse trap. 

4. Hunt paper clip. 

6. Bicycle bell. 


6. Shut-olT. 

7. Lock No. 1. 

8. Push button. 

9. Clothes pin. 

10. Wire stopper. 


SERIES II 


1. Sash fastener. 

2. Rope coupling. 

3. Defiance paper clip. 

4. Expansion nut. 

5 . Double-action hinge. 


6. Calipers. 

7. Elbow catch. 

8. Lock No. 2. 

9. Expansion rubber stopper. 

10. Pistol. 


*Now made and sold by C. H. Stoelting Company, Chicago. 



Measurement of Mechanical Aptitude 165 

Score on the test depends upon speed and accuracy 
with which assemblies are made. Norms for various ages 
and grades have been prepared. A number of reliability 
coeflBcients, most of them above .60, are reported by the 
author. 

Stenquist has made a number of studies of the validity 
of these assembly tests. He states that the best avail- 
able criterion of general mechanical ability of the kind 
supposedly measured by the tests has been manual-train- 
ing and science teachers’ ranks of pupils. To safeguard 
as far as possible against untrustworthiness of ranks, 
Stenquist utilized for the most part only classes having 
two shop teachers, so that their rankings could be 
checked against each other. Correlations between the 
Assembly Test scores and ranks for various classes which 
he studied are given in Table XVIII. Most of these are 
high enough to indicate considerable validity for Sten- 
quist’s method; to indicate (so far as the criterion can be 
relied upon) that his tests really measure mechanical 
ability. 


Table XVIII 

CORRELATIONS BETWEEN SHOP-TEACHER RANK AND 
SERIES I OF STENQUIST TESTS 


7th and 8th grade boys in Lincoln school .83 

8th grade boys in New York City Public Schools , .80 

8th grade boys in New York City Public Schools ... 42 

6th and 7th grade boys, Horace Mann School .81 

6th grade boys in Horace Mann School .90 

6th grade boys in Horace Mann School J8S 


The relationship between the Assembly Tests and gen- 
eral intelligence has also been extensively studied by 
Stenquist. Published correlations are shown in Table 
XIX. Unless otherwise indicated, general intelligence is 
measured by the Army Alpha Test. The correlations 



166 Measurement of Mechanical Aptitude 

indicate very little relationship between general intelli- 
gence (abstract) and mechanical ability as measured by 
the Stenquist Tests. 


Tablb XIX 

CORRELATIONS BETWEEN STENQUIST ASSEMBLY TESTS 
AND GENERAL INTELLIGENCE 


Camp Taylor, 109 unselected men 323 

Camp Devens, 107, foreign eliminated, but 

largely inferior cases .35 ' 

Camp Lee, 76 unselected men .30 

Camp Lee, 30 men below 50 in Army Alpha . .00 approx. 

Camp Lee, 216 men low grade, individually ex- 
amined . .00 approx. 

Camp Dix, 909 men, 303d Engineers, unselected .51 
Massachusetts School Feeble Minded, 30 cases 

with mental age .32 

Same group with officers* ratings .25 

For 100 7th- and 8th-grade boys, N. Y. Public 
Schools, between series I and composite intelli- 
gence score, made up of Haggerty, National, 

Otis, Kelley-Trabue and Meyers .397 

For same group, same tests, with Series II .338 


Because the Assembly Tests are limited in usefulness 
on account of their cumbersomeness, expensiveness, and 
individual test nature, Stenquist devised another test of 
a pictorial nature. This test is known as the Stenquist 
Mechanical Aptitude Test, two tests, I and II, being 
available. All the problems of the tests are presented in 
the form of pictures. The pictures are of common me- 
chanical devices, and the questions do not require special 
mechanical training or skill for their answering. A 
sample from Test I is shown in Fig. 9.®' 

Validity of the Picture Tests was studied by the same 
methods as were followed in the Assembly Tests, and by 

®From Stenquist Mechanical Aptitude Test I, Copyright by World 
Book Company, Yonkors-on-Hudson, New York. Reproduced by writ- 
ten permission of the publishers. 



Measurement of Mechanical Aptitude 167 

correlating the Picture Tests with the Assembly Tests. 
Correlations with shop ranks of students are similar to 
those found for the Assembly Tests. Correlations re- 
ported by Stenquist between the Picture Tests and the 
Assembly Tests average .67. 

II. The Minnesota Mechanical Ability Tests ^ 

In 1930, Paterson and others published the results of 
an extensive investigation of mechanical aptitude tests 
at the University of Minnesota. The investigators 



Fig. 9. — Sample from the Stenquist Mechanical Aptitude Test. 

(Answered by matching letters with numbers.) 


studied many types of tests which have been used to in- 
dicate mechanical ability, and arrived at batteries of tests 
which were demonstrated to have validity for measuring 
mechanical aptitude. Their tests were validated on boys 
in shop courses, using a combined (1) ^^quality criterion,’’ 
based upon quality of work done in shop courses, (2) 


4 Paterson, D. G., Elliot, R. M., et al : Minnesota Mechojiical Ability 
Testa, University of Minnesota Press, Minneapolis, 1930. 



168 Measurement of Mechanical Aptitude 

“quantity criterion,” based upon quantity of work done 
in relation to quality, and (3) “information” criterion, 
based upon objective information tests in the courses. 
The battery of tests having the highest validity gave 














Measurement of Mechanical Aptitude 169 

a correlation of .81 with combined criterion. This bat- 
tery consisted of : 

1. Minnesota Paper Form Board (see Fig. 10.) ® 

2. Academic Grades. 

3. Minnesota Assembly Test (similar to Stenquist). 

4. Boy’s mechanical operations (based upon a ques- 
tionnaire about home mechanical pursuits). 

5. Interest analysis blank (to find out about mechani- 
cal interests). 

6. Otis Intelligence Test. 

The authors of the Minnesota Tests have studied the 
relationships between their tests and a considerable num- 
ber of factors, as abstract intelligence, mechanical en- 
vironmental influences, age, sex, etc. For discussion of 
these the reader is referred to the original publications. 

III. The MacQuarrie Test for Mechanical Ability 

This is a test of the pencil-and-paper sort. It consists 
of the following parts: 

1. Tracing. 

2. Tapping. 

3. Dotting. 

4. Copying designs, with a series of dots as guide. 

6. Location — of dots in a group of letters. 

6. Blocks — ^telling from a picture how many blocks a 
given block touches. 

7. Pursuit — following a line through a tangle of lines 
and indicating where the line ends. 

The MacQuarrie Test has the advantage of simplicity 
in administration and scoring. Its author reports a re- 
liability of .90 and correlations with estimates of mechan- 
ical ability ranging from .48 to .80. 


® Reproduced by permission of the Marietta Apparatus Company, 
Marietta, Ohio. 



Each picture below marked with a number is USED WITH a picture on the right marked with a letter. 
Look at picture No. 1, then look at the pictures on the right with letters and write the letter of the picture that is 
USED WITH it. Then find the picture that is USED WITH No. 2. The samples are done correctly. Picture 
C is USED WITH picture No. 1, so C is written after 1 on the line at the right. B is USED WITH 2. 
“Nail” marked A is USED WITH “hammer” marked 3, so write A after 3 on the line at the right. 



170 


have been slightly altered from the original.) 




Measurement of Mechanical Aptitude 171 

IV. The O’Rourke Mechanical Aptitude Test 

O’Rourke’s series of vocational guidance tests includes 
a mechanical aptitude test. This is mentioned particu- 
larly as an example of a mechanical aptitude test includ- 
ing more verbal material and somewhat more dependent 
upon language responses than such others as the Sten- 
quist and MacQuarrie tests. O’Rourke’s test contains 
two parts. Part One is a combination picture and verbal 
test, the nature of which is indicated in Fig. 11.“ Part 
Two is a verbal information test, in multiple-choice form, 
about mechanical things. Two sample questions are: 

1. A carpenter drives a nail with a: (1) 
key (2) hammer (3) axe (4i screw (5) 

file . .... Ans 

2. A carpenter drives a screw into wood 

with a: (1) hatchet (2) wrench (3) 
scrcw-drivcr (4) file (5) nut Ans 

Tests such as this are usually somewhat more difficult 
than the performance type, frequently show higher cor- 
relations with abstract intelligence (probably because of 
their verbal nature), and are less suitable for testing 
among uneducated groups or groups of low abstract in- 
telligence. With many school groups, however, their use- 
fulness is not below that of the less verbal test. 


®From O’Rourke Mechanical Aptitude Test, Junior Grade. Quoted 
by permission of the author. 



CHAPTER X 


The Measurement of Aptitude for 
Vocations and Professions 


V OCATIONAL and professional aptitude tests are 
designed to predict ability to pursue successfully 
the various vocations, professions, or forms of training. 
They usually include measurement of the following qual- 
ities: (1) Intelligence in terms of the job; (2) Infor- 
mation either about the job or relative to the job; (3) 
Interest in the job; and (4) Any measurable special 
abilities requisite for the job. In this chapter we shall 
attempt to give an illustration of a test representing this 
field of measurement. We have selected the aptitude 
test for entrance to a professional school rather than the 
vocational aptitude test, since we shall have an oppor- 
tunity to consider the latter in our discussion of psycho- 
logical tests in industry. 

The professional schools furnish excellent examples of 
the need for aptitude testing or of the need for testing 
fitness to profit by their instruction. From the stand- 
point of the student desiring to enter these schools, the 
training is expensive in both time and money. Much of 
the training is too specialized or technical to be of gen- 
eral value if the student is unable to complete the whole 
of the training. From the standpoint of the professional 
school, the cost of training is too high to eliminate the 
unfit by an attempt for a short time to train them. Mis- 
fits need to be eliminated or redirected before the training 

172 



Vocational Aptitude Measurements 178 

is begun. An unusual opportunity for the use of apti- 
tude tests is also created by the fact that the supply of 
“would-be professionals” greatly exceeds the capacity of 
the training schools and also is beginning to exceed the 
vocational need in the world. The problem as it pre- 
sents itself, then, is one of selecting the most promising 
from the total number of applicants for admission to 
professional schools. 

In the professional schools much less testing has been 
done than has been done in general intelligence testing 
of college students. This lag is probably due to several 
reasons. First of all, the task itself is inherently difficult, 
all the more so because the carrying out of a successful 
testing program in such a situation demands an indi- 
vidual trained both in the profession and in the technique 
of psychological testing. Another factor has probably 
been the lack of familiarity with tests and the general 
prejudice against them on the part of professional school 
administrative officials. However, there have been a 
few instances of significant work in this field. Ferson 
and Stoddard have published and standardized a Law 
Aptitude Test.^ In the testing of entrants to engineering 
schools, considerable has been done at Stanford Univer- 
sity by the use of a test which the author has termed a 
Scientific Aptitude Test.* Thurstone also has an engi- 
neering test of the aptitude type.* Under the direction 
of the Association of American Medical Colleges, an ex- 
tensive testing of medical school applicants has been 
carried out. Tests have also been given to teachers-in- 

1 Person, M. L., and Stoddard, G. D, “Law Aptitude,” American 
Law School Review, 6:78, 1027. 

* Zyve, D, L., Stanford Scientific Aptitude Test, Stanford University 
Press, 1^. 

* Thurstone, L. L., Vocational Guidance Tests for Engineers, World 
Book Co., Yonkera-on-Hudson, N. Y., 1022. 



174 Vocational Aptitude Measurements 

training.* Since the testing program carried out in the 
selection of medical students has been the most extensive, 
we shall discuss it as a typical example of aptitude tests 
for use in professional schools.® 

I. The Problem 

Throughout the United States, there are 72 Class A 
medical schools which accept each year approximately 
7,000 students for admission to their freshman classes. 
During the last few years there have been between 13,000 
and 14,000 applicants per year. The problem, then, 
from the standpoint of the admission officials of the 
medical schools, is to secure the most promising 7,000 
from the total number of applicants. The problem of 
selecting medical students is further emphasized by a 
study of the mortality of the graduating classes in the 
medical schools. In the past, more than one-fifth of all 
those who have begun work in the freshman year have 
been unable to complete the four-year course. Here is 
an ideal situation for development of an instrument of 
measurement which will assist in the selection of the best 
students for admission to the professional school. For 
a long time, of course, criteria have been set up for ad- 
mission in these schools, and some of these criteria have 
been fairly exact in their selection. 

II. The Development of a Medical Aptitude Test 

The Scholastic Aptitude Test for Medical Schools, 
which we are to discuss in this chapter, was begun in 

^Moss, F. A., and others, Teaching Aptitude Test, Center for Psy- 
chological Service, Washington, D. C., 1927. 

5 Material on Medical Aptitude Test is summarized from Journal of 
Association of American Medical Colleges, March 1930, Jan. 1931, May 
1932, Jan. 1933, March 1934, and Jan. 1935. Quotations are made by 
permission of the Association. 



Vocational Aptitude Measurements 175 

« 

1927. The test represented an attempt to devise an in- 
strument which would indicate ability to pursue suc- 
cessfully a medical course, and which might be used as 
one of the determining factors in the selection of students 
for admission to medical school. During the school year 
1928-29,* under the direction of the Association of 
American Medical Colleges, the first form of this test was 
administered for experimental purposes in 14 medical 
schools to about 1,000 students. The tests were given 
about the middle of the school year to students who were 
in the freshman medical-school classes. As these stu- 
dents progressed through the medical school, their records 
were followed closely in comparison with the aptitude 
test scores which they had made on entrance. During 
the following year, 1929-30, a second form of the test 
was similarly applied to an experimental group of about 
5,000 freshman medical students, including those in most 
of the Class A medical schools in the country. Studies 
of these two experimental groups have been carried on 
up to the present time, some of the students having been 
followed through interneships and State Board Medical 
Examinations. Some of the results of these studies will 
be indicated in our discussion. Following the experi- 
mental study of these aptitude tests for medical schools, 
the tests have been adopted as a requirement for ad- 
mission to the medical schools and are now administered 
each year before applicants are considered for admission 
to freshman classes. 

III. The Nature of the Aptitude Test 

The test used in the studies being discussed has been 
developed in eight forms. These have contained the 
following parts: Scientific Vocabulary, Visual Memory, 
Memory for Descriptive Material, Pre-medical Informa- 



176 Vocational Aptitude Measurements 

‘■t 

tion, Learning and Retention of Material, Understanding 
of Difficult Printed Material, Ability to Follow Direc- 
tions, Logical Reasoning, and Spelling of Scientific 
Words. All of the various parts of the test are given in 
terms of the scientific pre-medical terminology. The 
vocabulary test is selected from the terms met by the 
student in his pre-medical training in the basic sciences. 
The pre-medical information part of the test has also 
been drawn from the pre-medical courses in physics, 
chemistry, and biology. The learning parts of the test, 
including the visual-memory, memory-for-descriptive- 
material, and retention tests, are based upon a prelimi- 
nary study of material selected from the field of medicine, 
unfamiliar to the student at the time that he studies it. 
His memory or learning ability in each instance is meas- 
ured by his retention of the material studied. 

The following samples from parts of the test will in- 
dicate more clearly its general nature: 

SCHOLASTIC APTITUDE TEST FOR MEDICAL 
SCHOOLS (FORM 1) 

Test 1 — Scientific Vocabulaby 

Directions. If a pair of words mean the same or 
nearly the same, encircle the S; if they mean the op- 
posite or nearly the opposite, encircle the 0 : 

SOI. anode-cathode S 0 6. relationship-cor- 

S 0 2. porous-imper- relation 

meable SOT. deviation-aberra- 

S 0 3. static-dynamic tion 

S 0 4. mechanistic- SOS. reduction-oxidation 

vitalistic S 0 9. objective-subjec- 

S 0 5. vaporize-volatil- tive 

ize S 0 10. conduit-duct 



Vocational Aptitude Measurements 177 


Test 2 — Pre-Medical Information 

Directions, If the statement is true, encircle the T] 
if it is false, encircle the F: 

T F 1. Limestone is an important source of calcium. 
T F 2. Ionization always accompanies electrolysis. 
T F 3. Hydrofluoric acid attacks glass. 

T F 4. Glycerol contains three hydroxyl groups. 

T F 5. The skull of vertebrates consists of a number 
of bones. 

T F 6. Hydrogen is generally prepared in the labo- 
ratory by the reaction of an acid and a metal. 
T F 7. Malaria is transmitted by a certain species 
of the Anopheles mosquito. 

T F 8. Standard conditions exist at a temperature 
of zero degrees Centigrade and a pressure of 
760 mm. of mercury. 

T F 9. At the completion of a chemical reaction the 
properties of a catalyst remain unchanged. 

T F 10. Acetic acid is highly ionized. 

T F 11. The color of an object depends upon the 
wave lengths produced by it reaching the eye. 
T F 12. The salivary glands are endocrines. 

T F 13, The latent period in muscular contraction is 
the interval between the point of highest con- 
traction and the beginning of relaxation. 

T F 14. The spinal cord is part of the central nerv- 
ous system. 

T F 15. Thiocyanates contain sulphur. 

Test 3 — ^Visual Memory 

Directions. Below is the diagram which you studied 
at the beginning of the test. The various parts are 
numbered and the names appear at the right. Put in 
the blank before each name the number which desig- 
nates that particular part. 



178 Vocational Aptitude Measurements 



Arch of Aorta 
Inferior Vena Cava 
Ascending Aorta 
Pulmonary Artery 
Thyroid Gland 
etc. 


Test 4 — Memory for Content 

Directions. If the statement is true, encircle the T; 
if it is false, encircle the F: 

T F 1. The phrenic nerves supply the heart. 

T F 2. The coronary arteries supply the heart mus- 
cle. 

T F 3. In general, the venous apparatus is shown on 
the right side of the heart machinery. 

T F 4. The thyroid gland is situated between the 
thoracic aorta and inferior vena cava. 

T F 5. All blood leaves the heart from the ventri- 
cles. 

Test 5 — Comprehension and Retention 

Directions. Answer the following questions in ac- 
cordance with the passage on “Speech Defects” which 
you read at the beginning of the examination. If the 
statement is true according to this passage, make a 
circle around the T; if it is false, make a circle around 
the F : 

T F 1. Psychoanalysis is used in treating speech de- 
fects due to emotional disturbances. 

T F 2. Some individuals are able to recognize objects 
but are unable to name them. 



Vocational Aptitude Measurements 179 

T F 3. All paralyses of the vocal apparatus are 
caused by brain injuries. 

T F 4. Psychoanalysis is pointed out as being most 
effective in curing organic speech defects. 

T F 5. Injuries to the seventh nerve cause paralysis 
of certain of the muscles used in speech. 

Test 6 — Understanding of Printed Material 

Directions. Read the selection below and then an- 
swer the questions following it by writing the answers 
on the lines at the right. You may read the selection 
as many times as you desire and refer to it as often as 
necessary. 

Planes and Diameters of the Pelvis 

The planes of the pelvis are usually designated as (1) 
the superior strait, (2) the inferior strait, (3) the plane 
of the greatest, and (4) the plane of the least, pelvic 
dimensions. 

The superior strait represents the upper boundary of 
the cavity, and is frequently spoken of as the pelvic inlet. 

It is somewhat oval in shape, with a depression on its 
posterior border corresponding to the promontory of the 
sacrum, and is sometimes described as blunt heart- 
shaped. It is bounded posteriorly by the promontory 
and alae of the sacrum; laterally by the linea termin- 
alis ; anteriorly by the horizontal rami of the pubic bones 
and the upper margin of the symphysis pubis. Strictly 
speaking, it is not a mathematical plane, since its lateral 
margins, as represented by the linea terminalis, are at 
a lower level than its central portion between the prom- 
ontory and symphysis. Etc. 

1. What is another name for the superior 

strait? 

2. What diameter was named by Roe- 

derer? 

3. Which of the four diameters of the su- 

perior strait furnishes the basis for esti- 
mating the size of the pelvis? 



180 Vocational Aptitude Measurements 

4. Which is the longest diameter of the 

superior strait? 

6. What diameter is designated as the con- 

jugata vera? 

IV. The Validity of the Test 

The validity of this aptitude test has been indicated 
principally by a study of the relationship between the 
test scores of students and their subsequent performance 
in the medical schools, their performance being based 
upon their medical school records. The students in the 
experimental groups who entered medical school in the 
fall of 1928 or the fall of 1929, and certain students tested 
later in pre-medical colleges, have been followed through 
their four years of medical school work. The average 
correlation between their four-year scholarship averages 
and their Medical Aptitude Test scores is stated as .59. 
This relationship is somewhat higher than that usually 
obtained between ordinary intelligence tests and scho- 
lastic records. The value of the test has also been 
studied from the standpoint of percentage of failures and 
percentage of high marks obtained in medical school by 
students of various test score levels. Such a study for 
one of the experimental groups is indicated in Fig. 12. 
The students were divided into ten equal groups on the 
basis of their scores on the aptitude test. Each bar of 
the chart shows the percentage of failures occurring in 
the four years and the average grade of the group. It is 
of interest to note that in the highest tenth no man has 
failed in any of the four years of the medical course, and 
the group shows an average grade for the four years of 
over 86 (an exceptionally high average for medical school 
grades). Eleven per cent of this group graduated with 
final medical school averages of 90 or higher. On the 



Vocational Aptitude Measurements 181 

other hand, at the end of the medical course, 60 out of 
100 failed in the lowest tenth, and the final average for 
this whole tenth is only 75, just barely passing. No stu- 
dent shows an average as high as 90, and the majority of 
those who graduated show low grades and failures in 
some courses during the four years. 


Hipest Tenth of 
Test Scones 


Average grade fbr fbur ijears 863 


Second Decile 


Average grade for four gears 84.9 


Third Decile 

Rourth Decile 

Fifth Decile 

Sixth Decile 

Seventh Decile 
Eighth Decile 


] 


Average grade for four years 64.9 


1^1 Average grade for Pour years 83.7 | 


I Average gracie for four years 63.4 [ 



Average grade for fbur years 63.1 


.j Average grade fbr four years SZ^ 


y Average grade for four years 81.7 


Ninth Decile 


Average grade fbr four years 81.6 | 


U)west Tenth of | 
Test Scores 


Average grade fer4i)rs. 75.0 


Fig. 12. — ^Distributioii of Medical School Grades According to 
Medical Aptitude Test Scores. 


Based on the distribution of medical school perform- 
ances over the four-year course for these students, we 
may say that: 

If a student has a score as high as the upper tenth 
tested: 














182 Vocational Aptitude Measurements 

(1) The chances are 100 per cent that he will graduate 
from the medical school; 

(2) and the chances are 3 to 1 that he will average 85 
or over for the whole four years. 

On the other hand, if he is as low as the lowest tenth 
tested: 

(1) The chances are 60 out of 100 that he will not be 
able to graduate, because of failure to carry the 
work successfully; 

(2) and the chances are 9 to 1 that he will have an 
average below 85 if he does graduate; 

(3) or 2 to 1 that he will have an average below 80. 

The validity of the Medical Aptitude Test has been 
further studied by comparing the success of certain of 
the students as internes in hospitals with their original 
Medical Aptitude Test scores. Such a study is reported 
for approximately 500 students interning in hospitals 
during the year 1932-33. Each student so tested was 
rated on his interne work by the superintendent or physi- 
cian of the hospital in charge of internes. The ratings 
were made on a scale of one to five, “1" being a high 
rating and “ 5 ” a low rating. The interpretations of these 
ratings as given to the rater were as follows: “1” comes 
up to the best ’interne hospital has had; “2” is good, above 
average, but not equal to the best interne; “3” is equal 
to the average interne hospital has had; “4” is below the 
average interne, but better than the poorest hospital has 
had; “5” is among the poorest internes hospital has had. 
The relation between the aptitude test scores and the 
ratings as internes is shown in Fig. 13. 

Another thing that might be of interest in our discus- 
sion of this aptitude test is the comparative study which 



Vocational Aptitude Measurements 188 

has been made of the test and other criteria often used in 
the admission of students to the medical schools. In any 
study of a new method of selecting students or em- 
ployees, it would be logical to expect that a battery of 
criteria for admission or selection, the parts of which will 
supplement each other, would be the best. In the 


Aptitude Test Scores p 
Xbove200(Highest I 
Tenth) L 


Average Rating as Internes 1.6 


None Rated ^orS 


Average Ratina as Internes 2.2 


'150-199 (Above 
Median Score) 


AptitodeTest Scores 
^0-I49(0elov<r 
Median) 


7^ rated 4or5 

Average Rating as' Internes 2.4 


25%ratedl. 


0%rated4or5 


as Internes 3w2 

37%rated4or5 

Fig, 13. — Distribution of Interne Ratings According to Medical 
Aptitude Test Scores. 



I 09 b rated 1. 


Aptitude Test Scores 
Below iOO (Lowest 
Tenth Graduating) 


medical schools there are various factors which have in 
the past been considered in admitting students. The 
most commonly used of these are : grades on pre-medical 
subjects; number of semester hours credit offered in pre- 
medical college work; personal interviews and character 
records; and in some instances, age at entrance. A com- 
parative study of the value of each of these factors in 
predicting the failures occurring in medical schools has 
been reported for 1,000 medical students graduating in 
June 1932. The relative advantages of the five criteria 
indicated are shown in Fig. 14. The predictions of per- 
centages of failures refer to percentages of actual failures 
which occurred in the group throughout their four years. 





184 Vocational Aptitude Measurements 

The percentage of good students excluded is indicated, 
since it is obvious that any criterion that is used to elim- 
inate failures must be one which excludes as few good 
students as possible. These figures do not, of course, 
consider any possible good students eliminated by the 


COMBINATION CRITERION 
(Test and Premedical 
Scholarship) 


69% of Failures Predicted 
^ 4% of High Averages Excluded 


APTITUDE TEST SCORES 


53% of Failures Predicted 



5% of High Averages Excluded 


PREMEDICAL SCHOLARSHIP 

‘"■ftstfsi'si;,) 


44% of Failures Predicted 


] 25% of High Averages Excluded 


ENTRANCE CREDITS 
(If admit wifh 70 or 
more semester hours) 


36% Failures Predicted 


Excluded 


A6E AT ENTRANCE 
C If onl^ below 


26% Fail. Pred, 



13% High Averages Excluded 


Fig. 14. — Comparison of Various Criteria for Admission to Medical 

Schools. 


standards now in use (as “passing” pre-medical grades, 
or 60 semester hours of admission credits). Since stu- 
dents falling below these standards were not admitted, 
there was no chance to find out anything about them so 
far as medical school performance was concerned. An- 
other factor studied and reported for a smaller group was 
the personal interview often given medical school appli- 




Vocational Aptitude Measurements 185 

cants. Thirty-three per cent of failures predicted has 
been indicated for this criterion of admission. Such 
studies as this indicate the relative validity of the test 
itself, and also emphasize the value of the test for use 
in conjunction with other criteria for selecting students, 
especially in conjunction with pre-medical grades. 

Dr. Moss, who has acted as Secretary and Director of 
Study for the Committee on Aptitude Tests of the Asso- 
ciation of American Medical Colleges since its foundation, 
in one of his reports for the Committee summarized the 
value of the aptitude test as follows: 

We present the Aptitude Test as an additional crite- 
rion for selecting the best students for training in our 
medical schools. Studies of the validity of the various 
criteria which might be employed seem to indicate that 
the Aptitude Test is, in most instances, the best single 
criterion that we have ; but we do not contend that in it 
we have a panacea. No criterion which we have been 
able to discover or set up is perfect; we can arrive at 
the best selection only by the wisest use of all the cri- 
teria available which demonstrate a real relationship 
to ability to pursue successfully a medical course. 

Success in medical school undoubtedly depends on a 
complex set of factors, the most important of which 
may be listed as: 

(1) Innate ability or aptitude, for which our 
test is a very usable indication. 

(2) Previous preparation, a fair index of which 
is given by pre-medical grades. 

(3) Energy: It occasionally happens that a 
student has good ability and a satisfactory prep- 
aration but fails because of lack of energy. A 
physical examination may be of assistance in 
such cases. 

(4) Social adaptability or ability to fit in with 
fellow students and not antagonize instructors. 



186 Vocational Aptitude Measurements 

We have no reliable measure of this, but ratings 
by pre-medical instructors may be of service. 

All of these four factors should be taken into consid- 
eration in selecting medical students. Efforts should be 
made to secure more reliable methods for measuring 
these factors. 



CHAPTER .XI 


The Measurement of Interests 

O NE might question the inclusion of interests in 
this section of the book. No claim is made that 
interests are to be classed in the category of aptitudes or 
special talents. They are, however, usually somewhat 
specialized; at least, the infrequency of finding only 
general and broad interests makes the measurement of 
specialized interests desirable. Also, many of the same 
reasons for measurement of special abilities and apti- 
tudes may be suggested as reasons for measurement of 
interests. We have, therefore, included the discussion of 
interests in this section dealing with measurement of 
aptitudes. 

Interests have to do with our reactions to stimulus 
elements in our environment. The stimulus elements 
may be concrete objects, persons, ideas, techniques, or 
almost anything that we may think of as capable of call- 
ing out a reaction or a response in an individual. Fryer 
distinguishes between “subjective interests” and “objec- 
tive interests”; and he also classes the various means 
that have been devised for measuring interests into these 
two groups. Subjective interests are dependent upon the 
feelings that accompany interest experiences. If the 
feeling is one of pleasantness, we term it an interest; if 
one of unpleasantness, an aversion. In the terminology 
of many of the measuring devices, interests hre likes and 
aversions are dislikes. Between these are the indifferent 

187 



188 The Measurement of Interests 

feelings of neither interest nor aversion. Objective in- 
terests are observable reactions to stimuli. If one walks 
down the street and stops to look intently at a sports ex- 
hibit in a window, we may say that he is showing a mani- 
festation of interest that is objectively observable. 

Vocational and occupational problems have probably 
carried the greatest weights in directing the attention of 
psychologists to the measurement of interests. Those 
who have given their attention to the problem have come 
to realize that the most helpful vocational guidance con- 
siders one’s interest in following a vocation as well as his 
abilities for pursuing it; that ideal methods of selecting 
employees must take account of the fact that failures in 
work can occur from lack of interest as well as from lack 
of capacity or training. Fryer has emphasized another 
reason — a cultural reason — for the attention to measure- 
ment of interests:^ 

The cultural importance of interest measurement 
... is only now becoming recognized. We have so long 
regarded abilities as the criterion of life’s success that 
we have neglected the philosophy of happiness. To 
view interest measurement in true perspective we must 
start with an assumption different from the one cur- 
rent in this commercial age in which successful accom- 
plishment is the criterion of measurement. Happy 
accomplishment is the foundation of a modem indi- 
vidualistic philosophy. Interest measurement is con- 
cerned with a distribution of interests which this 
philosophy assumes as the basis of happiness. The sig- 
nificance of interest measurement lies not in its relation 
to social efficiency, but rather in its measurement of a 
cultural development which is related to social happi- 
ness. 


1 Fryer, Douglas, The Measurement oj Interests, Henry Holt & Co.> 
New York, 1931, p. v. 



The Measurement of Interests 


180 


I. The Methods of Measuring Interests 

The chief methods that have been used in measuring 
interests are (1) inventories or questionnaires, (2) rating 
scales, (3) information tests, (4) free association tests. 
The first two of these fall in Fryer’s classification of sub- 
jective measurement of interests. They represent an 
estimate by the individual of how he has reacted in the 
past or would react in the future to the stimuli of the 
test. The third and fourth methods are objective meth- 
ods of measurement. These tests measure reactions to 
stimuli under standard stimulation conditions, without 
consideration of the estimate by the person being tested 
or by others. 

1. The inventory or questionnaire. The interest in- 
ventory presents to the individual a list of items calcu- 
lated to stimulate people to feelings of like or dislike, 
and asks him to estimate his feeling toward the items just 
as if they were stimulating him. The items may be oc- 
cupations, books, magazines, people, amusements, or 
courses of study. Most frequently the estimated inter- 
ests toward the items are to be recorded in terms of Like, 
Dislike, or Indifference. Finally, the interest inventory 
usually provides a method of scoring or summating the 
responses of the testee to indicate degree of interest in 
various broad fields, as an occupation, or a field of inter- 
est such as mechanical interest. 

The best example today of a standardized interest in- 
ventory for general use with adults is the Vocational In- 
terest Blank by E. K. Strong, first published in 1928. 
Very recently Strong has published a form especially 
adapted for women. The “Vocational Interest Blank” 
represents a revision and extension of an earlier inventory 
worked out by Cowdery. Together at Stanford Univer- 



190 The Measurement of Interests 

sity, Cowdery and Strong have conducted the most ex- 
tensive studies available on measurement of interests. 
Strong’s interest blank contains 420 items classified into 
eight parts, as follows: (1) Occupations, (2) Amuse- 
ments, (3) School Subjects, (4) Activities, (5) Peculiari- 
ties of People, (6) Order of Preference of Activities, (7) 
Comparison of Interest between Two Items, (8) Rating 
of Present Abilities and Characteristics. Sample items 
from the blank are quoted to show its nature. The items 
are in most instances to be marked as Like (L), Indif- 
ferent toward (I), or Dislike (D).^ 

Part I. Occupations 


Actor (not movie) L I D 

Advertiser L I D 

Architect L I D 

Army OflScer L I D 

Artist L I D 

Astronomer L I D 

Athletic Director L I D 

Auctioneer L I D 

Author of Novel L I D 

Author of Technical Book L I D 

etc. 

Part II. Amusements 

Golf L I D 

Fishing L I D 

Hunting L I D 

Tennis L I D 

Driving an Automobile L I D 

Taking Long Walks L I D 

Boxing L I D 

Checkers L I D 


2 Strong, E. K., Jr., Vocational Interest Blank, Stanford University 
Press, Stanford University, California, 1928. 



The Measurement of Interests 191 

Chess L I D 

Poker L I D 

etc. 

Pabt III. School Subjects 

Algebra L I D 

Agriculture L I D 

Arithmetic L I D 

Art L I D 

Bible Study L I D 

Bookkeeping L I D 

Botany L I D 

Calculus L I D 

Chemistry L I D 

Civics L I D 

etc. 

Pakt IV. Activities 

Repairing a clock L I D 

Making a radio set L I D 

Adjusting a carburetor L 1 D 

Repairing electrical wiring L I D 

Cabinet making L I D 

Operating machinery L I D 

Handling horses L I D 

Giving “first-aid” assistance L I D 

Raising flowers and vegetables L I D 

Decorating a room with flowers L I D 

etc. 

Part V. Peculiarities of People 

Progressive people L I D 

Conservative people L I D 

Energetic people L I D 

Absent-minded people L I D 

People who borrow things L I D 

Quick-tempered people L I D 

Optimists L I D 

Pessimists L I D 



192 


The Measurement of Interests 


People who are natural leaders LID 

People who assume leadership L I D 

etc. 


Part VI. Order of Preference 
OF Activities 

(Three lists of ten activities each. In each list the 
three enjoyed most and the three enjoyed least must be 
checked.) 

Part VII. Comparison of Interest Between 
Two Items 


Street-car motor- Street-car conduc- 

man ( ) ( ) ( ) . . tor 

Fireman (fights 

Policeman () () ().. fire) 

Chauffeur ( ) ( ) ( ) Chef 

Head waiter .... ( ) ( ) ( ) . Lighthouse tender 

House-to-house 

canvassing ( ) ( ) ( ) . . Retail selling 

House-to-house 

canvassing ( ) ( ) ( ) Gardening 

Repair auto ( ) ( ) ( ) . Drive auto 

Develop plans ( ) ( ) ( ) . . Execute plans 

Delegate job to 

Do a job yourself. . ( ) ( ) ( ) . . another 

Persuade others . . ( ) ( ) ( ) . . Order others 


Part VIII. Rating of Present Abilities 
AND Characteristics 

Yes ? No 


Usually start activities of my group . . . ( ) ( ) ( ) 

Usually drive myself steadily (do not 

work by fits and starts) () () () 

Win friends easily () () () 

Usually get other people to do what I 

want done ( ) ( ) ( ) 

Usually liven up the group on a dull day ( ) ( ) ( ) 



The Measurement of Interests 


198 


Am quite sure of myself ( ) ( ) ( ) 

Accept just criticism without getting 

sore ( ) ( ) ( ) 

Have mechanical ingenuity (inventive- 
ness) ()()() 

Have more than my share of novel ideas ( ) ( ) ( ) 

Can carry out plans assigned by other 
people ( ) ( ) ( ) 


(a) The selection of items for interest inventories. 
Evolution of the interest inventory shows two criteria 
that have been set up in the selection of items. One is a 
sampling criterion. No inventory could contain all the 
items of interest, or aversion, with respect to a particular 
interest group, so the aim has been to obtain an adequate 
sampling of the total possible items. A second criterion 
has been a discrimination criterion. The aim has been to 
select items that discriminate in the interest estimates 
assigned them between groups of people, occupational 
groups in particular, and to omit those items that are 
common interests or aversions of the different groups to 
be distinguished by the inventory. Such discriminative 
value of the items of an inventory can be discovered only 
through detailed tabulation of responses made by various 
groups of persons. 

(b) The scoring of interest inventories. Strong’s scor- 
ing of his Vocational Interest Blank exemplifies the most 
refined technique in the scoring of interest inventories. 
Scores may be obtained for interest in 30 different occu- 
pations. For each occupation, each possible answer re- 
ceives a scoring weight dependent upon the difference in 
per cent of answers of the occupational group for which 
the scoring is being worked out, and a group of “men in 



194 


The Measurement of Interests 


general” representing many occupations. Strong em- 
ploys a special formula for working out these weights.® 
The general validity of the weights assigned to items 
is reflected in the differentiation of the occupational 
group in question from “men in general” when scored for 
interest in the occupation. This may be illustrated by 
Strong’s study of the occupation of personnel manager. 
When the interest inventories are scored with the per- 



Fig. 15. — Distribution of Scores for Interest in Personnel 
Management. 


sonnel managers’ scoring key, the scores are distributed 
as shown in Fig. 15. The range of scores for the personnel 
managers is from 45 to 285, for the non-personnel group 
from — 175 to 225. The overlapping by the non-per- 
sonnel group is about 28 per cent. Seventy-two per cent 
of the personnel managers are distinguished from the non- 
personnel men, or, to state the situation another way: 
the chances are 72 in 100 of selecting personnel managers 
from non-personnel men by their interests. 

Strong reports similar studies for his various other 
occupational groups. For example, chances in 100 of dis- 
tinguishing the occupational groups from a general group 
of mixed occupations are: 95 for artists, 84 for ministers. 


* Strong, E. K., Jr., “Procedure for Scoring an Interest Test,” Psycho^ 
logical Clinic, Vol. XIX, 1930, p. 63. 



The Measurement of Interests 


195 


75 for lawyers, and 64 for engineers. Not all occupations 
that Strong has studied, however, can be differentiated 
to a degree that would make of practical value their in- 
terest measurement by his inventory. His “executive” 
group could not be distinguished to any practical degree 
from non-executive groups. The chances of selecting 
“executives” by interest score from a non-executive group 
was only 30 in 100. 

2. The rating scale. In the measurement of interests, 
the rating scale has been relatively little used as compared 
with the inventory or questionnaire. In many instances, 
however, the rating technique would seem to be as 
applicable to the field of measurement of interests as to 
that of measurement of abilities. We may illustrate the 
technique in the measurement of interests by a procedure 
used by Cox in studying the interests of a group of genius 
children.^ She used a seven-point scale ranging from 
plus 3 to minus 3. Plus 3 was the rating assigned for 
interest of the highest degree ; plus 2, interest considera- 
bly above average; and so on, to minus 3, which desig- 
nated interest of the lowest degree. Her subjects were 
rated on seven aspects or types of interest, including in- 
tellectual interests, social interests, activity interests, 
breadth of distinct interests, breadth of related interests, 
intensity of a single interest, and intensity of two or more 
interests. 

Kitson ® has also utilized the rating scale technique in 
some of his studies of vocational interests. He has asked 
his subjects to rate interest in their vocation on a numeri- 
cal scale by comparing it with interest in other occupa- 


* Cox, C. M., “The Early Mental Traits of Three Hundred Geniuses,” 
Genetic Studies of Genius, Vol. II, Stanford University Press, 1926. 

“ Kitson, H. D., “Measuring the Interest of Teachers in Their Work,” 
Teachers College Record, Vol. XXX, No. 28, 1928. 



196 


The Measurement of Interests 


tions. His ‘Vocation-to-vocation^' rating scale of inter- 
ests is reproduced below. 

VOCATION-TO-VOCATION RATING SCALE OF 

Interests (Kitson) 

Indicate, by making a check on this scale, 
the degree of interest you have in your oc- 
cupation (not the present job, but the oc- 
cupation itself) . As the 100-degree point, 
think of that activity in which you would 
spend the major portion of your time if 
you had a million dollars and were not 
obliged to work. Then check the point on 
the right of the scale which denotes the de- 
gree of your interest in your present occu- 
pation. 

3. Information tests. This method of measuring in- 
terests is based upon the assumption that knowledge and 
interest go together, that one will tend to inform himself 
about those things which interest him. Burtt, who 
worked out an Interest Test of Agricultural Engineering 
on this basis, states the theory back of such measurement 
as follows:® 

There is some ground for the assumption that if a 
person is interested in a certain field he will pick up 
information about it — will be more familiar with the 
terminology and with less obvious details that would 
presumably be overlooked by a person who lacked that 
interest. Consequently, an information test may give 
some indication of interest if the items are carefully 
selected. 

We mention some examples of the use of this method 
of measuring interests. 

® Burtt, H. E., Employment Psychology, Houghton Mifflin Co., Bos- 
ton, 1926, p. 302. 




The Measurement of Interests 197 

(a) Mechanical interest tests. In our discussion of 
mechanical aptitude tests we noted that certain measure- 
ments in this field made use of information tests. Part 
II of the O’Rourke Mechanical Aptitude Test ’ is essen- 
tially an information test, and is based upon the assump- 
tion that those with mechanical aptitude will be 
interested in mechanical things and will, therefore, ac- 
quire information above the average about such things. 
Tests in the field of mechanical interests utilizing the 
information test method were also developed for use in 
the Army during the World War period, largely through 
the efforts of Toops and O’Rourke. Their “General 
Trade Interest Test” consisted of a number of one-word 
answer questions calling for information in the general 
field of mechanical trades. These tests were found very 
useful in the placement of soldiers and in the selection of 
those desiring to enter the Army trade schools. 

(b) Social interest tests. There are several instances 
in which information tests have been used as interest tests 
in the field of social measurements. The early form of 
the “Social Intelligence Test” (Moss, Hunt, and Om- 
wake) made use of a Social Information Test on the as- 
sumption that breadth of interests is related to one’s 
social intelligence and that this breadth can be measured 
by an information test. The material of this test was 
expressed in true-false form. The following are sample 
items: 

T F 1. The nickname of the Cliicago Nationals is 
Red Sox. 

T F 2. A white tie should be worn with a tuxedo 
suit. 

T F 3. The term “right bower” is used in playing 
bridge. 


^ See p. 171 for sample questions. 



198 


The Measurement of Interests 


T F 4. Coney Island is near New York. 

T F 6. In hotels run on the European plan the 
charges include both room and meals. 

T F 6. The composer of the opera ^Taust'' was Gou- 
nod. 

T F 7. Membership in the American Automobile As- 
sociation is limited to members elected each 
year. 

T F 8. The Speaker of the House of Representatives 
is elected by members of the House from 
their own number. 

T F 9. The Statler system refers to a chain of de- 
partment stores. 

T F 10. In federal elections for President the major- 
ity of votes cast in Tennessee are usually for 
the Democratic candidate. 

T F 11. The Army and Navy football game is usually 
played at either West Point or Annapolis. 

T F 12. Jane Cowl is a vaudeville actress. 

Ream in 1921 devised a ‘‘Social Relations Test^^ based 
upon the same general assumptions that form the basis 
of the Social Information Test. His material is in mul- 
tiple-choice form, such as: 

2. In what organization is eleven o^clock of special 
significance? Elks — Odd Fellows — Masons — Knights 
of Columbus. 

4. What is a caucus? A national political convention 
— An official county election — A meeting of politi- 
cians within a party — A secret political meeting in 
violation of the law. 

6. What kind of race is a derby? Trotting — Pacing — 
Running — Hurdling. 

(c) Vocational interests tests. An example of a test 
in this field is McHale’s Vocational Interest Test for Col- 
lege Women, The test contains 247 questions in multi- 
ple-choice form divided among four types of vocational 



The Measurement of Interests 199 

interests: law, business, medical sciences, and homemak- 
ing. 

(d) Terman’s information test of play interests.^ This 
test was used by Terman in his investigation of play in- 
terests of gifted children as a part of his extensive study 
of superior children. It is a multiple-choice test covering 
various types of games — social, solitary, active, quiet, 
etc. Questions are such as: 

1. You pick up jackstraws with a Magnet — Hook — 
Fingers. 

2. A game where you look for something hidden is I-spy 
— Old Witch — Roly-Poly. 

3. “Hearts” is played with Cards — Dice — Dominoes. 

Terman gives norms based upon several hundred cases 
for normal and genius children of both sexes. 

It is rather difficult to make any very conclusive gen- 
eralization as to the value of information tests as a meas- 
ure of interests. The information test possesses high 
reliability, in most instances considerably higher than for 
other methods of measuring interests. It has the advan- 
tage of being applicable to various types of interest test- 
ing. It can be adapted to testing general range of 
interests or to testing interests in specialized fields. In- 
formation tests as measures of interests are likely to fall 
short of the purposes for which they are devised in 
failing to sample extensively enough the field of testing, 
in measuring ability or experience rather than interests, 
and in putting too high a premium on abstract intelli- 
gence and verbal ability. The most carefully constructed 
tests of this nature, however, seem to possess a value 


* Terman, L. M., and others, “Mental and Physical Traits of a Thou- 
sand Gifted Children,” Vol. I, Genetic Studies of Genius, Stanford Uni- 
versity Press, 1926. 



200 


The Measurement of Interests 


worthy of retaining the method in further studies of 
measurement of interest. 

4. Association tests. Association tests of the type 
already described in Chapter VI have been used as a 
means of indicating a person’s interests. These tests are 
based upon the assumption that a person with a particu- 
lar kind of interest will be likely to respond to a stimulus 
word, in a large percentage of cases, with a particular 
kind of response indicative of his interest. Provided a 
long enough list of stimulus words is presented, such a 
test may give a fairly reliable indication of the general 
field of interest. The most extensive investigation of in- 
terests by this method of testing has been carried on by 
Wyman. She devised an interest test suitable for young 
people between the ages of eight and fifteen, for studying 
interest reactions in three fields: intellectual, social, and 
activity interests. By intellectual interest, she had in 
mind interest in “knowing — in getting the meaning of 
things.” By social interest, she meant interest in persons. 
Activity interest she considered to be interest in doing 
things — interest in being the leader. Her association test 
consists of 120 stimulus words to be presented verbally 
with responses to be given in writing. The words have 
been balanced equally for responses in intellectual, social, 
and activity fields of interest. Her test list is repro- 
duced on page 201. Either list of 60 words may be used 
for a short test. 

The real value and the validity of an association test 
for interests depend upon the care and accuracy with 
which the scoring key is prepared. Wyman’s method 
makes use of a technique similar to that used by Strong 
in assigning score weights to the various responses in his 
inventory. The various responses to the stimulus words 
are assigned score weights for each of the interest groups 



The Measurement of Interests 


201 


by comparing the responses of a trial group mth the in- 
terest with those of a group without it. For example, 
a particular response to a stimulus word receives a score 
weight for intellectual interest in proportion to the degree 
to which that response differentiates individuals of known 
intellectual interests from individuals known to be lack- 
ing in such interests. As is true in scoring the responses 
of the interest inventories, the working out of scoring 

WYMAN STIMULUS WORDS FOR TESTING INTELLECTUAL, 
SOCIAL, AND ACTIVITY INTERESTS » 


1. summer 

31. evening 

2. easy 

32. hard 

3. diamond 

33. ring 

4. tire 

34. play 

5. dog 

35. learn 

6. fair 

36. band 

7. school 

37. dark 

8. help 

38. platform 

9. nature 

39, pity 

10. active 

40. thrill 

11. dream 

41. idle 

12. shock 

42. hero 

13. joy 

43. vacation 

14. dislike 

44. master 

15. nut 

45. bat 

16. go 

46. fun 

17. angel 

47. power 

18. nice 

48. interested 

19. water 

49. fond 

20. boy 

50. trip 

21. wish 

51. make 

22. museum 

52. yard 

23. delight (ed) 

53. aim 

24. work 

54. fairy 

25. cave 

55. exercise 

26. pleasant 

56. companion 

27. house 

57. career 

28. imagine 

58. fire 

29. range 

59. like 

30. admire 

60. great 


1. night 

31. sundown 

2. simple 

32. difficult 

3. gem 

33. dress 

4. join 

34. enjoy 

5. control 

35. need 

6. white 

36. music 

7. college 

37. black 

8. protect 

38. stage 

9. sky 

39. watch 

10. restless 

40. excite 

11. wonder 

41. useful 

12. fault 

42. castle 

13. pleasure 

43. holidays 

14. detest 

44. captain 

15. paper 

45. rod 

16. travel 

46. mischief 

17. princess 

47. rain 

18. alone 

48. interesting 

19. current 

49. good 

20. girl 

50. journey 

21. desire 

51. form 

22. history 

52. island 

23. contented 

53. try 

24. train 

54. giant 

25. adventure 

55. game 

26. happy 

56. friend 

27. marble 

57. science 

28. invent 

58. camp 

29. coimtry 

59. prefer 

30. attract 

60. grand 


® Wyman, J. B., “The Measurement of Interest,” Vocational Guidance 
Magazine, 1929, Vol. VIII, p. 64. 



202 The Measurement of Interests 

methods for association tests of interests is a tremendous 
job. 

Wyman’s test possesses a satisfactory reliability, as 
shown by reliability coefficients between .80 and .90 when 
the test is scored for the three types of interests. The 
validity of the method has been studied by correlating 
the interest scores with teachers’ estimates of dominant 
interests in pupils. Such correlations reported by the 
author of the test are generally between .45 and .65. 
These seem as satisfactory as most validity coefficients 
based upon personal estimates. At best, they furnish a 
questionable basis for validation, but it is difficult to 
obtain a more satisfactory criterion with which to com- 
pare the test scores. 

Wyman has studied several questions bearing a rela- 
tionship to the use of this testing device. She has stud- 
ied the relationship between the interest scores and intel- 
ligence and achievement test scores. The former give 
an average correlation of .48 with intelligence test scores 
and an average correlation of .50 with Stanford Achieve- 
ment Test scores. Working out partial correlations be- 
tween interest scores and achievement scores holding 
intelligence constant, Wyman found correlations as fol- 
lows; for intellectual interests, .49; for social interests, 
.18; and for activity interests, .03. The correlation of 
.49 for intellectual interests is three points higher than 
the coirelation between intellectual interests and intel- 
ligence scores, which might show the importance of in- 
tellectual interests as well as ability in achievement. 
Wyman also made a study of the permanence of interests 
as measured by her test. The average correlation for 
the three fields of interests tested the second time, after 
a period of five years, is .28. It would seem that interests 
in these three fields lack any high degree of permanence 



The Measurement of Interests 


203 


in school children. This is in substantial agreement with 
studies of permanence of interests using other methods 
of measurement. 

The following quotation from Fryer seems a fitting 
conclusion to our whole discussion of the measurement of 
interests: 

The present state of development in the measurement 
of interests corresponds to an early stage in the meas- 
urement of abilities. Twenty years ago there was one 
impressive scale for the measurement of abilities, the 
Binet-Simon Scale, and many other experimental tests. 

In the measurement of interests, today, there is one out- 
standing measuring scale, Strong’s “Vocational Interest 
Blank,” with its scoring keys for various occupations, 
and many other experimental devices which may prove 
satisfactory in the future. 

There is, however, an important difference to be noted 
in this comparison of development in the two fields of 
measurement. The measurement of abilities had 
achieved an objective basis by 1908. Today the objec- 
tive measurement of interests is still largely in the ex- 
perimental stage. It is in the measurement of subjec- 
tive interests that there has been developed a technique 
which is as well-defined as in the Binet-Simon Scale 
of 1908 and more complicated. It may be that a sec- 
ond decade will establish for the field of interests an 
objective measure. If so, the subjective measures will, 
of course, go into the discard, unless they contribute 
something additional to the measurement. 

Abilities have always been predominately objective in 
their definition, even prior to their measurement, while 
the early conceptions of interests are all subjective. In 
the field of interests there is the additional problem of 
finding out exactly what are objective interests, or what 
is the objective aspect of what are known as subjective 


Flyer, Douglas, The Measurement oj Interests, Henry Holt and 
Company, New York, 1931, p. 326. 



204 


The Measurement of Interests 


interests, so that the measurement of interests may be* 
come an objective problem. 

In the statistical treatment of the measuring scales, 
however, the field of interests is far ahead of the field 
of abilities of 1908. We know better, today, how to 
find out the value of an inventory or test than we did 
twenty years ago. The application of statistical meth- 
ods to the measurement of interests, particularly to the 
subjective measures, shows that the scales and scoring 
keys are of suggestive value — ^not as suggestive as are 
measures of abilities today, but fully as suggestive as 
the ability measures prior to the World War, just before 
the publication of Yerkes, Bridges, and Hardwick’s 
Point Scale in 1915 and Terman’s revision of the Binet- 
Simon Scale in 1916. The validity coeflScients of interest 
measures, in comparison with those of the measures of 
abilities today, are promising, and the applications of 
the interest inventories are as well advanced as the ap- 
plications of ability scales in 1915. 



Part IV 


MEASUREMENT OF ACHIEVEMENT 




Chapter XII 


The Measurement of Achievement 
in Schools 


I. What Are Tests of Achievement? 

A n achievement test, theoretically, measures 
an accomplishment. It differs from an intelligence 
test, such as we have just considered, in that it measures 
the achievement without reference to the capacity behind 
the achievement, whereas the intelligence test aims at 
measuring the capacity. As contrasted with the intelli- 
gence test, which measures primarily native capacity or 
ability, the achievement test measures primarily some- 
thing that has been learned or acquired. It is unfortu- 
nately rather confusing for the student of school tests that 
our intelligence tests are so often indirect measures that 
utilize, as the medium of showing one’s mental ability, 
fundamental material which has been acquired in school. 
But the intelligence tests do not aim at finding out how 
much the educational accomplishment is; in fact we 
utilize as the medium of intelligence testing only that 
educational content which is rather far behind the in- 
dividuals being tested and which can be assumed to be 
more or less constant for all being measured by the test. 
For example, we do not use reading material to measure 
the intelligence of the first-graders; we use pictures or 
some non-verbal material. But for high school students, 

207 



208 Measurement of School Achievement 


all of whom we can assume have had a fundamental 
training in the mechanics of reading, we can use a verbal 
medium for measuring their mental powers. 

The need for reliable devices or instruments for meas- 
uring achievement arises in many problems involved in 
dealing with human beings. Throughout the educational 
world, pupils’ achievements in the various school sub- 
jects must constantly be measured for purposes of moti- 
vating learning, promoting to higher classes or schools, 
assigning honors and credits, and guiding educationally 
and vocationally. In the factory or shop, achievements 
in the form of mechanical skills must often be measured. 
In the oflSce, achievement measures are in terms of 
clerical efficiency or typing ability or stenographic 
ability. Teachers are year after year being measured in 
some way to indicate teaching efficiency. For far too 
few of these and other needs for measuring instruments 
for achievements have reliable devices been worked out. 
There are numerous achievement tests for school sub- 
jects, many of which are very satisfactory; there are a 
few achievement tests of an informational type for knowl- 
edge of a vocational sort; there are a few standardized 
trade tests for more mechanical accomplishments or 
skills; and there are attempts at working out rating 
scales for measuring various total job efficiencies, most of 
which have not proved reliable in use. This chapter will 
be concerned chiefly with the first of these four groups; 
the second and third are patterned largely after the first; 
and the last — the rating scales — will be considered with 
our later discussions. 

II. Psychology’s Contribution to Achievement Testing 

Psychology can he given much of the credit for demon- 
strating the unreliability of the older methods of measure 



Measurement of School Achievement 209 


ing achievement. It has pointed the way to the develop- 
ment of improved instruments of measurement. As 
early as 1912 and 1913, Starch and Elliott^ published 
articles in the School Review on reliability of grading in 
various high school subjects, in which they demonstrated 
that with the ordinary type of academic examinations 
then in vogue, achievement measurements could not be 
relied upon to be accurate. In one of these studies, the 
authors selected a final examination paper in geometry 
written by a high school student. A reproduction of this 
paper, with the set of examination questions, was sent to 
180 high schools in the North Central Association, with 
the request that it be graded according to the practice 
and standards of the school by the principal teacher of 
mathematics. In 116 replies obtained, two assigned a 
mark above 90 and one below 30. Forty-seven who 
graded the paper assigned a mark of passing (75) or over; 
69, below passing. 

More recent investigations of similar measuring in- 
struments corroborate the findings of the earlier studies. 
Tiegs reports a study of the grading of a seventh-grade 
physiology test of five questions by a group of teachers 
from Los Angeles schools, who were assembled for the 
purpose of studying tests and measurements. Thirty- 
one teachers assigned to the paper percentage marks 
ranging from 20 to 90, distributed as shown in Table XX. 
As Tiegs states: “These results would indicate that the 
failure or success of this boy depended upon the teachers 


1 Starch, D., and Elliott, E. C., "Reliability of Grading High School 
Work in Mathematics,” School Review, 21: p. 254, April 1913; "Relia- 
bility of Grading High School Work in English,” School Review, 20: 
p. 442, Sept. 1912; "Reliability of Grading High School Work in His- 
tory,” School Review, 21 : p. 676, Dec. 1913. 

2 Tiegs, Ernest W., Tests and Measurements for Teachers, Houghton 
Mifflin Co., Boston, 1931, p. 21. 



210 Measurement of School Achievement 


fate gave him, rather than upon his knowledge of physi- 
ology or his answers to the questions asked.” 

Table XX 

PER CENT MARKS ASSIGNED 
TO PHYSIOLOGY EXAMINATION 

Per Cent 


Marks Frequency 

90-100 4 

m-S9 8 

70-79 3 

60-69 2 

50-59 9 

40-49 4 

30-39 0 

20-29 1 


31 

Such studies as these are all indictments against 
written examinations of the traditional or essay type. 
They violate the first rule for a good measuring stick — 
that it give the same results every time applied for meas- 
uring the same thing, no matter who applies it. And 
yet for decades they have been used as measuring instru- 
ments, often with a mistaken confidence in their exact- 
ness and accuracy of measurement. 

Whatever pedagogical or other values we may justly 
assign to this type of examination, its shortcomings dis- 
qualify it as a measuring instrument. The same criti- 
cisms and the same disqualifications apply also to many 
other schemes for measuring achievements outside the 
classroom, as the schemes for rating efficiency of teachers 
and workers. 

On a second score, psychology deserves the chief credit 
for the working out of new methods of measuring achieve- 
ment. In the classroom these center around the devel- 
opment of “short-answer” type examination questions. 



Measurement of School Achievement 211 

In achievement testing, these were first utilized in the 
construction of rather formal tests in the specific school 
subjects, which were standardized, printed, and supplied 
by test distributing centers for survey testing and other 
formal programs of measurement. The newly developed 
intelligence tests furnished the pattern for the types of 
questions used in the achievement tests. Close upon the 
introduction of the formal standardized achievement tests 
came the appreciation of the merits of objective types 
of examination questions for informal classroom use, and 
now in many schools much of day-by-day or week-by- 
week measurement is done by objective tests constructed 
by the instructor or teacher. As educational workers 
become more trained in the principles and technique of 
measurement, the use of these informal objective achieve- 
ment tests will probably increase. 

We have already met many of the objective “short- 
answer” type questions in our previous considerations. 
For completeness in our present discussion, the chief 
types are listed and defined. For actual sample ques- 
tions, the reader is referred to a subsequent part of this 
chapter. 

1. The true-false question. This type of question 
consists of a true or a false statement, the truth or falsity 
of which is to be indicated in some way by the person 
answering the question or item. Particular advantages 
of this t5q)e of question include relative ease of construc- 
tion, suitability for a wide variety of material, and quick- 
ness with which it can be answered, making it possible 
to cover a wide field in a short testing time. 

2. The multiple-choice question. This type presents 
to the testee a question or a problem with several (usu- 
ally three to five) suggested answers or solutions, only 
one of which is correct. The testee answers the question 



212 Measurement of School Achievement 


by an indication of his choice among the answers. This 
type of question has found particular use when it is 
desired to test reasoning and judgment rather than 
simple information, although the test form can be used 
for testing the latter also. Contrasted with the true- 
false form of question, the multiple-choice question is 
somewhat more difficult to construct and is more time- 
consuming from the standpoint of answering. 

3. The completion question. This form of test 
question usually requires a one-word or at most a short- 
phrase response from the testee. It may be in the form 
of a sentence with an omitted word; or it may be in the 
form of a question which can be answered by a word, or 
at most a few words. This type of question is in contrast 
to the two forms that have already been mentioned, in 
that it requires recall on the part of the testee, rather 
than simply recognition. 

4. The matching test. This type of test usually 
consists of two columns of items, each item in one column 
to be matched with an appropriate item in the other 
column. For example, a column of dates may be 
matched with a list of historical events; or a column of 
authors’ names may be matched with a list of books. 
This type of test has a limited usefulness as compared 
with other types discussed, because of its unsuitability 
for many kinds of material on which tests are to be 
constructed. 

In addition to these forms, which meet most of the 
needs of testing, the student will come across other types 
of short-answer questions in special situations. Among 
them may be mentioned (1) identification questions, 
where parts of drawings or photographs are to be indi- 
cated; (2) questions which involve the detection and 



Measurement of School Achievement 213 

correction of errors; (3) classification questions, in which 
various items are to be classified under a given code or 
classification scheme; and (4) arrangement in correct 
order of misarranged events of history or steps in a 
procedure. 

III. Examples of Achievement Tests for Schools 

There have been literally hundreds of tests constructed 
and standardized for the various school subjects. We 
shall examine some t 3 TDical ones. No claim is made that 
the examples selected are the best in the field; they are, 
however, typical in nature of the material, have been well 
standardized, and have generally been widely used. 

1. An elementary school reading test. The Haggerty 
Reading Examinations (Sigma 1 for primary grades and 
Sigma 3 for upper grades) are among the best known and 
most widely used tests in elementary school achievement. 
The primary test consists of two parts, the first in the 
nature of a directions test involving both words and 
pictures, and the second a true-false test calling for 
interpretation of simple sentences. The whole test in- 
cludes 45 items, requiring about a half hour for answer- 
ing. The test is thoroughly objective in scoring and has 
a high reliability. The author reports a reliability coef- 
ficient of .84 obtained by correlating the results of two 
testings on a group of 200 children tested six weeks apart. 
The validity of the examination has been demonstrated 
by numerous studies which show that it agrees well with 
other reliable estimates or measurements of scholastic 
ability in reading. Age and grade standards are avail- 
able for interpretation of scores made on the test. A few 
samples are given from the test to show its nature. 



214 Measurement of School Achievement 
HAGGERTY READING EXAMINATION— SIGMA 1 » 


Part 1 



6. Make two lines under the big 
bubble that is in the air. 

7. Put a cross above the pipe in 
the girl's hand. 


Part 2 


1. Can you eat? no ves 

2. Can a hat walk? no yes 

3. Can a clock talk? no yes 

11. Does flour come from milk? no yes 

12. Is every man a soldier? no yes 


13. Are dresses sometimes made of gingham?, no yes 


2. A high school literature test. The George Wash- 
ington University Series of standardized tests contains a 
comprehensive test of high school literature by Omwake, 
Schwarz, and Ronning. This test is divided into three 
parts: 50 questions in multiple-choice form, 90 in true- 
false form, and 20 in matching form. The authors state 
that construction of the test was based upon an analysis 
of content of English literature courses as stated in cata- 
logues and bulletins of 22 representative states and 15 
larger cities of the United States. A reliability coef- 
ficient of .90 is reported, based upon retest of 98 high' 
school seniors. Validity of the test is indicated by close 
agreement of test scores with other objective tests in 
English and with school grades in literature courses, and 


8 Copyright by the World Book Company, Yonkers-on-Hudson, New 
York. Reprinted by written permission of the publishers. 



Measurement of School Achievement 215 

by the increase in scores from year to year in the high 
school with increasing study of literature. 

ENGLISH LITERATURE TEST ^ 

Part 1 

Directions, Write the number of the best answer 
on the line at the right. 

1. Lady Macbeth sent the guests away from 

the banquet because (1) she was frightened 
by the ghost of Banquo (2) Macbeth was 
very ill (3) she feared for the safety of her 
guests (4) she wanted to prepare for the 
murder (5) she feared Macbeth would be- 
tray his crime 

2. Francis Bacon^s chief contribution to lit- 

erature was in the form of (1) drama (2) 
essays (3) poetry (4) criticism (5) letters 

3. Burke considered his proposed plan for 

dealing with the colonies (1) philanthropic 
(2) impracticable (3) expedient (4) revo- 
lutionary (5) temporary 

4. Poe’s short stories abound in (1) reverence 

(2) horror (3) humor (4) patriotism (5) 
romance 

5. The poetry of Burns is noted chiefly for 

(1) its matchless English style (2) its ac- 
curate pictures of society life (3) its 
classical allusions (4) its faultless form 
(5) its sympathetic treatment of homely, 
every-day themes 

6. '^Robinson Crusoe” is noted for its (1) 

adventure (2) satire (3) mystery (4) 
hidden meanings (5) elaborate style . . 

7. Samuel Pepys is remembered chiefly be- 
cause of his (1) plays (2) letters (3) 

diary (4) essays (6) poetry 

* Quoted by permission of Center for Psychological Service, Wash- 
ington, D. C. 



216 Measurement of School Achievement 


8. The Black Knight is a character in (1) A 

Tale of Two Cities (2) The Last Days of 
Pompeii (3) The Black Cat (4) Ivanhoe 
(5) The Last of the Mohicans 

9. ^^Sohrab and Rustum” is a (1) satire (2) 

drama (3) comedy (4) pastoral (5) nar- 
rative poem 

10. The early authority on fishing was (1) 
Isaak Walton (2) Thomas Huxley (3) 
Samuel Pepys (4) Daniel Defoe (5) 
Henry Thoreau 


Part 2 

Directions, If the statement is true, encircle the T\ 
if it is false, encircle the F, 

T F 1. 'T1 Penseroso’’ describes the charms of a 
merry social life. 

T F 2. The hero of “Gulliver’s Travels” visited 
a country inhabited by men six inches 
tall. 

T F 3. Macbeth desired the death of Banquo 
because he was jealous of the military 
achievements of Banquo. 

T F 4. “Pilgrim’s Progress” is one of the great- 
est prose allegories in literature. 

T F 5. Most of Bret Harte’s stories deal with 
high society life. 

T F 6. Samuel Johnson is noted especially for 
his lyric poetry. 

T F 7. In Shakespeare’s time women’s parts 
were played by boys. 

T F 8. In his poem “The Bells,” Poe described 
the process of making bells. 

T F 9. George Bernard Shaw is a well-known 
dramatist of modern times. 

T F 10. Caliban in “The Tempest” was a kindly 
spirit. 



Measurement of School Achievement 217 


Part 3 

Directions, Place on the line preceding each writer 
in Section B the number of the writer in Section A of 
whom he is a contemporary. For example, the number 
5 is placed on the line preceding Boswell because he is 
the contemporary of Johnson, whose number in Sec- 
tion A is 6. 

Section A 

1. Chaucer 

2. Shakespeare 

3. Milton 

4. Pope 

5. Johnson 

6. Wordsworth 

7. Tennyson 

8. Kipling 


Section B 

.5 Boswell, James 

Addison, Joseph 

Bacon, Francis 

Barrie, James 

Browning, Robert 

Burke, Edmund 

Coleridge, Samuel 

Defoe, Daniel 


3. A college chemistry test. We may select another 
achievement test from the George Washington University 
series — one which is a good illustration of an objective 
test in the college field. The one which has been selected 
for our example is a test in general college chemistry. 
This particular chemistry test was used in an extensive 
study of measurement of teaching efficiency which we 
shall have occasion to refer to again later on in this 
book.® The test is well standardized, norms being given 
for students who have had a year of college chemistry 
both with and without a background of high school chem- 
istry. The 150 points of the test are distributed in five 
different parts, which are indicated by the sample ques- 
tions below: 


® See Chapter XIV. 



218 Measurement of School Achievement 

GENERAL CHEMISTRY TEST « 

Part 1 

Directions, In the statements below, certain words 
and numbers have been replaced by lettered blanks. 
Insert the omitted words or numbers in the correspond- 
ing blanks at the right. 

I. Avogadro’s hypothesis states that 
equal volumes of gases at the 
same temperature and pressure 
contain an equal number of 


II. The gram-molecular weight of a gas 
at standard pressure and tem- 
perature occupies a volume of 

^ liters. (f>) 

Part 2 

Directions, If the statement is true, encircle the T; 
if it is false, encircle the F, 

T F 1. Mendeleef outlined the periodic relations 
of the elements. 

T F 2. Hydrogen is generally prepared in the 
laboratory by the reaction of an acid and 
a metal. 

T F 3. An element always passes from one form 
of combination to another without change 
of valence. 

T F 4. Carbon dioxide is made in the laboratory 
. by the reaction of hydrochloric acid and 
calcium carbonate. 

T F 5. Limestone is an important original source 
of calcium. 


® Quoted by permission of Center for Psychological Service, Wash- 
ington, D. C. 



Measurement of School Achievement 219 


Past 4 

Directions. Complete and balance the following 
equations. 

H, S 04 + 2Na0H > 

Zn + 2HC1 > 

CaC, + 2HjO > 

Part 5 

Directions. Solve the following problems. Place 
the answer on the line at the right. Use the left margin 
for any figuring needed. 

I. One volume of oxygen combines with two 

volumes of hydrogen to form how many 
volumes of HjO in gaseous form? 

2. What volume of 0.5 normal solution of 
HCl would be required to neutralize 
1500 cc. of 2.0 normal NaOH solution? 

4. Objective measurement of school achievement by 
product scales. Certain school subjects which must be 
graded or rated as frequently as any other subjects do not 
lend themselves to the type of measurement that we 
have just illustrated by several sample tests. Such sub- 
jects are handwriting, composition, drawing, sewing, 
and other less frequently rated ones of a similar 
nature. Tests cannot be composed for these perform- 
ances or skills in terms of so many questions to be 
answered, the answers to which can be easily scored as 
right or wrong. Merit in such subjects clearly depends 
upon quality of a total performance, and an achievement 
grade or rating should be indicative of the degree of this 
quality. Quality judgments left to the free decision of 
the teacher or rater usually have all the disadvantages 
of ratings derived from grading essay-type examination 
questions. Hence scales for the more accurate grading 



220 Measurement of School Achievement 

of subjects like handwriting and composition have been 
developed. Some of these scales, in fact, antedate most 
of the test work that has been done in terms of short- 
answer type test material. Thorndike’s Handwriting 
Scale appeared in one edition as early as 1909, and Ayres’ 
Scale as early as 1912. 

Generally the various product scales consist of several 
samples of performance which have been carefully 
selected, graded, and scaled with a range in quality from 
very poor to very good. Thus, the Ayres Measuring 
Scale for Handwriting consists of eight samples of hand- 
writing graded from 20 to 90. The WilUng Composition 
Scale consists of eight samples of compositions graded 
also from 20 to 90. The Murdock Sewing Scale contains 
photographs in three views of fifteen sewing samples. 
Students’ performances in these various subjects are 
graded by comparison with the scale samples. The use 
of the scales insures greater objectivity of rating ; estab- 
lishes greater uniformity of ratings given at various 
times; and establishes standards for grading that can be 
compared from teacher to teacher and from school to 
school. An illustration from one of these scales is given 
in Fig. 16." 

5. Comprehensive achievement tests and classification 
tests. Should one desire to measure the whole educa- 
tional achievement over a considerable period of school 
training, there are several standardized achievement 
tests covering the whole high school or elementary school 
field. These are sometimes referred to as classification 
tests because of their usefulness in classifying pupils into 
sections on the basis of their educational achievement 
and intelligence. 


7 Reproduced by permission of the Russell Sage Foundation, New 
York, N. Y. 




(Script of original is blue.) 






222 Measurement of School Achievement 

In the high school field the Iowa High School Content 
Examination, devised by Ruch, and the Sones-Harry 
High School Achievement Test are typical. Both of 
these tests contain four parts covering, respectively, 
English and literature, mathematics, science, and the 
social sciences. 

Probably the most used of the general achievement 
tests in the elementary school field is the Stanford 
Achievement Test, devised by Kelley, Ruch, and Ter- 
man. This is published as a Primary Examination and 
an Advanced Examination. The Advanced Examination 
contains parts on the following subjects: Reading — Para- 
graph Meaning; Reading — ^Word Meaning; Dictation 
Exercise; Language Usage; Literature; History and 
Civics; Geography; Physiology and Hygiene; Arithmetic 
Reasoning; and Arithmetic Computation. As a test of 
achievement the Stanford Examination has many ad- 
vantages. It is published in several forms; its reliability 
and validity have been established by thorough studies; 
and reliable norms are available for both age and grade. 
A feature of this test as well as of some other educational 
achievement tests is the possibility of deriving from the 
test standards a measure of educational attainment com- 
monly designated as “Educational Age.” For example, 
if a pupil’s total score on the test is equivalent to the 
average or standard for age 10 years 6 months, we may 
say that his Educational Age is 10 years 6 months. This 
gives a measure of educational achievement which has 
more meaning than simply the raw test score. The use 
of Educational Ages also permits the calculation of Edu- 
cational Quotients by dividing the pupil’s Educational 
Age by his chronological age. Such quotients are similar 
in meaning to intelligence quotients based on intelligence 
tests using mental age standards. They give us in one 



Measurement of School Achievement 228 


measurement a relative rating of the child’s educational 
attainments; or, in other words, they show us in one 
quantitative rating whether or not the child is up to the 
educational expectations for his age. If he is, his Edu- 
cational Quotient will be 1.00 or more; if he is not, it 
will be below 1.00. Accomplishment Quotients are also 
sometimes worked out by a combined use of educational 
and intelligence tests. Mathematically, Accomplishment 
Quotient equals Educational Quotient divided by Intelli- 
gence Quotient. In general such a measure indicates 
whether the pupil is attaining in educational accomplish- 
ment what he is capable of in accordance with his intelli- 
gence. If he is, his Accomplishment Quotient should be 
above the 1.00 line; if he is not, it should be below. 

IV. The Nature of Short-Answer Achievement Tests 

In closing this discussion of achievement tests, let us 
briefly consider the advantages which have given rise to 
the widespread use of the short-answer questions. Al- 
most all the advantages are either directly or indirectly 
related to the objectivity of the questions. They are said 
to possess this quality because they can be scored or 
graded without the play of opinion or subjective judg- 
ment on the part of the scorer or grader. Each question 
is definitely right or wrong and there is not the necessity 
of making a decision as to the relative merits of each 
answer given. 

Other advantages which are frequently pointed out for 
“short-answer” achievement tests include the following: 

(a) Short-answer tests are reliable. They give the 
same or very nearly the same results when applied at 
different times to the same group. They give the same 
results when administered by different examiners or 
teachers. They give the same results when scored or 



224 Measurement of School Achievement 


graded at different times by the same grader. And, 
finally, results on the tests are the same when scoring or 
grading is done by different teachers or graders. 

(b) Short-answer tests make possible the construction 
of equivalent examinations. Many occasions present 
themselves in the educational world in which it would be 
highly desirable, we might think even necessary, to have 
several examinations of equivalent difficulty. Such oc- 
casions arise in the comparisons made from year to year 
in schools and classes; in the administration of state ex- 
aminations; and in the giving of examinations for such 
things as college entrance. We cannot adequately com- 
pare a sixth grade this year with a sixth grade next year 
unless they take equivalent examinations or examinations 
whose difficulties are definitely known. We cannot be 
fair if one year we give a state examination which fails 
only 10 per cent of the testees and next year one which 
fails 30 per cent of an equal-ability group. Ben Wood ® 
some time ago made a study of examinations (traditional 
type) given by the College Entrance Examination Board. 
He says: 

Large and unrecognized differences were found be- 
tween the difficulties of the examinations which were 
thought to be equal. According to the reports of the 
Secretary of the College Entrance Examination Board, 
the percentages of failures in algebra for the years 1916 
to 1921 inclusive have been as follows: 

Year 

1916 1917 1918 1919 19W 1991 
Percentage of failures 61S 36.7 25.3 613 26.1 28.5 

These variations are almost certainly due to differences 
in the difiiculties of the examinations used. 

8 Wood, B. D., “Measurement of College Work,” Educational Admin- 
iatration and Supervision, 7 : p. 301, Sept. 1921. 



Measurement of School Achievement 225 


Because of the subjectiveness of the traditional ts^pe 
of examination question, it would be impossible even 
with prior statistical study of the questions to construct 
equivalent examinations. However, when the need 
arises, a minimum of preliminary analysis of objective 
question material will insure fairly equivalent examina- 
tion material in various tests. 

(c) Definite standards can be set on objective tests. 
These may be averages of attainment for various grades, 
for various ages, for groups taught by different methods, 
for those pursuing different courses of study, etc. Such 
standards, if set, mean little in traditional essay-type 
examinations, because if a class fails to meet a stated 
standard, we are not sure whether the failure represents 
a lack of attainment among the pupils or too strict grad- 
ing on the part of the teacher of the class. 

(d) Short-answer type of tests are comprehensive in 
nature. Because of the quickness with which the ques- 
tions can be answered, and therefore the number that can 
be asked, they are likely to represent a much better sam- 
pling of the field being tested. 

(e) Short-answer tests readily show pupils where they 
are wrong. For this reason they may be said to possess 
considerable pedagogical value, though their critics have 
often praised the traditional type of examination for its 
greater pedagogical value as compared with the short- 
answer examination. It is true that short-answer ex- 
aminations do not show where the reasoning went wrong 
in incorrect responses, but this can hardly be claimed in 
most instances for any other type of examination. 

(/) Short-answer tests are easily scored. Many of 
them do not even require a knowledge of the subject on 
the part of the scorer. A clerically accurate person is an 
adequate scorer in many instances. 



226 Measurement of School Achievement 

( jf) Pupils and other testees as a rule like short-answer 
tests. The tests appeal because of their ease of taking, 
because of their wide sampling of the field, and because 
of their fairness. 



CHAPTER XIII 


Objective Achievement Tests in 
Professional Schools 


T he need for objective test methods in measuring 
achievement in the higher levels of education is 
just as great as in the lower schools. Universities and 
professional schools must utilize measurements of the 
achievements of their students for assigning class grades; 
for advising students educationally and vocationally; for 
recommending them for occupational and vocational 
placements; and, frequently, in the professional schools 
at least, for giving them initial certifications to enter the 
various professions, as medicine, law, etc. 

I. The Difficulties Encountered 

The development of objective tests, particularly well- 
studied standardized tests at the university and profes- 
sional school level, has met with certain difiiculties which 
we may consider greater than those at the public school, 
or elementary and high school, level. In the first place, 
the material of many university courses is less standard- 
ized in content from course to course or college to col- 
lege. This fact limits the usefulness of standardized 
tests, but it does not affect the usefulness of less formal 
examinations constructed on objective test principles. 
In the second place, the nature of the subject matter of 
the courses often makes the construction of the objective- 
test type of questions more difficult than for other edu- 

227 



228 Objective Achievement Tests 

cational levels. This difficulty, however, is not insur- 
mountable, and can usually be overcome by training and 
practice on the part of the test maker in the preparation 
of objective test material. In the third place, university 
instructors have often hesitated to accept the objective 
type of test for measurement of their students because 
they believe that the short-answer type of question fails 
to measure qualities in keeping with important aims of 
the course. To these instructors, the short-answer ob- 
jective test does not measure, to a sufficient degree, quali- 
ties such as reasoning, organization of material, and 
judgment about problems of the course. This criticism 
may be justified in the case of poorly constructed tests, 
but it is hardly a fair criticism of well-made ones — good 
tests of the objective type can measure all these qualities. 
And even those objective test forms which do measure 
primarily memory and purely informational knowledge 
about a course correlate highly with test forms which 
aim primarily at measuring reasoning, judgment, or or- 
ganizational ability as applied to the same subject matter. 

II. Early Studies in Achievement Testing 
in Professional Schools 

The application of objective test principles in the 
higher professional 'schools began in the early twenties. 
Ben Wood’s book Measurement in Higher Education con- 
tains an examination of this type administered in me- 
chanical engineering courses in 1922. During the school 
year of 1924-5 a study of objective test methods in 
medical school subjects was undertaken in certain de- 
partments of the George Washington University Medical 
School by Moss and Hunter.^ The following year, sim- 


1 Hunter, Oscar B., and Moss, F. A., ^^Standardized Tests in Bacteri- 
ology,*' Public Personnel Studies, Vol. Ill, No. 2, February 1925, p. 52. 



Objective Achievement Tests 229 

ilar studies were made of objective tests in certain sub- 
jects at the College of Physicians and Surgeons, Colum- 
bia University, under the guidance of Wood.* The 
foreword to Wood^s report, written by the Dean of the 
College of Physicians and Surgeons, expresses a favorable 
and hopeful attitude toward the use of objective meas- 
urements of achievement in the professional school: 

Anyone who follows the work of a class of students 
in medicine is surprised each year at the showing made 
by a considerable proportion of the group. The marks 
given their examination papers are often surprisingly 
high and quite as often surprisingly low. This leads 
the one who corrects the old fixed type of examination 
paper to wonder as to his ability as a critic or as to 
the dependability of the student, or perhaps, of the ex- 
amination as a test of his mental equipment. 

The idea that an examination in the Medical School 
could be prepared so as to give a more exact estimate 
of the student^s ability was received by the members of 
the staff with enthusiasm and real hope, even if asso- 
ciated with some doubt. That such a plan would also 
decrease the labor connected with rating and grading 
such examinations brought even greater joy. There- 
fore, when Professor Wood discussed the project he had 
in mind, it was taken up by several of the departments 
with enthusiasm. 

We are trying to be conservative and reserve our 
final decision in the matter, but the results of this, his 
first study with us, make us at least optimistic for the 
future. We sincerely appreciate his efforts and trust 
that the studies may continue. 

III. An Objective Bacteriology Test 

As an example of measurement of achievement at the 
professional school level, we shall discuss one of the tests 

2 Wood, Bon D., ^^New Type Examinations in the College of Physicians 
and Surgeons,” The Journal of Personnel Research, Vol. V, No. 6, Oc- 
tober 1926, p. 227. 



280 Objective Achievement Tests 

developed in the George Washington study — test in 
Bacteriology. This test consists of four parts, as fol- 
lows: 

Test 1 — Organisms, Methods, Injection, and Immu- 
nity: Section A of this test consists of 50 questions in the 
naultiple-choice form, and Section B of 50 questions in 
the true-false form. Test 2 — Recognition and Diagnosis 
oj Bacterial Micro-Organisms: In Section One are the 
names of 15 micro-organisms, 10 of which are described 
in Section Two. The person being tested writes the 
name of the organism listed in Section One which most 
nearly fits the description given in Section Two. Test 3 
— Laboratory Procedure: Four procedures, each of 
which can be accomplished in five steps, are listed, but 
not in the order in which they would actually be done; 
in addition, several useless steps are included. For each 
procedure, the person taking the test numbers from one 
to five the steps that he would use to carry out the pro- 
cedure, in the order in which he would use them. Test 4 
— Identification oj Micro-Organisms from Lantern Slides: 
Ten lantern slides of pathogenic bacteria, showing typical 
colonies, various cultural characteristics, cell groupings, 
and the typical morphology of the organisms, are shown. 
From the slide the organism must be identified. Portions 
of this test are reproduced below as sample material." 

TEST I— ORGANISMS, METHODS, INFECTION, 
AND IMMUNITY 

Sectiox a 

Directions. For each of the following questions four 
answers are suggested. Before each answer is a space 
in which to make a mark. Read over the four answers 
and then place a cross (X) in the space before the 


s Quoted by permission. 



Objective Achievement Tests 231 

answer which is best or most nearly correct. Do not 
place a cross (X) before more than one answer under 
each question; if you do, your work on that question will 
not be counted. Only one of the four answers is the 
best one and you are to show by the cross (X) which 
it is. 

1. Where the presence of an organism reinforces or 
augments the growth of another organism that 
condition is spoken of as: 

Saprophytism 

Parasitism 

Symbiosis 

Plasmolysis 

2. It has been demonstrated that the disinfectant 
power of most chemical substances is proportional 
to: 

Concentration 

Dissociation into ions 

Solubility in alcohol 

Temperature at which boiling takes place 

3. The one of the following micro-organisms which 
usually occurs intra-cellularly is the: 

Staphylococcus aureus 

Bacillus diphtheriae 

Gonococcus 

Bacillus tuberculosis 

4. In sterilizing in the hot air chamber the heat is 
kept for about one hour’s time at a temperature of: 

50 degrees C 

100 degrees C 

150 degrees C 

200 degrees C 

5. Material rich in albuminous substances is usually 
sterilized: 

In a hot air chamber 

In an Arnold 



282 


Objective Achievement Tests 

In an autoclave 

In an inspissator 

6. A substance which inhibits growth and multiplica- 
tion of bacteria, but does not destroy them, is 
called: 


A germicide 

A disinfectant 

An antiseptic 

A sterilizer 

7. For substances not injured by high temperatures or 
moisture, the quickest method of sterilization is by: 

The inspissator 

The Arnold sterilizer 

Boiling 

The autoclave 

8. The “Arnold” is often used in sterilization by 

The hot air method 

Steam under pressure 

Exposure to live steam at 100 degrees C 

Boiling 

9. Facultative anaerobes are bacteria which: 

Cannot grow in the presence of free oxygen 

Cannot grow without free oxygen 

Prefer free oxygen but can grow without it 

Prefer an environment in which there is no 

free oxygen but can grow in the presence of 
free oxygen 

Section B 

Directions, Examine each statement below and 
decide whether it is true or false. If the statement is 
true, encircle the T, If it is false, encircle the F, 



Sample: 


T F 
T F 

T F 

T F 

T F 

T F 

T F 

T F 
T F 
T F 


Objective Achievement Tests 233 

© F The streptococci occur in chains. 

T © The chief cause of tuberculosis is 
the typhoid bacillus. 

1. Organisms that retain the gentian- violet 
are said to be Gram positive. 

2. Plasmoptysis occurs when the cell is re- 
moved from a medium of high osmotic 
pressure to one of low osmotic pressure. 

3. The development of bacteria is arrested 
more often by accumulation of waste 
products than by exhaustion of nutrient 
material. 

4. The term immunity as used by bacteri- 
ologists means the inability of an animal 
to become infected by a certain micro- 
organism. 

5. The characteristic affinity of specific 
bacterial poisons for certain tissues is 
generally recognized by bacteriologists. 

6. As a rule the mutations produced in a 
micro-organism readily revert to type 
when the micro-organism is subjected to 
the proper environment. 

7. Sporulation is the most common method 
of multiplication in many forms of bac- 
teria. 

8. All bacteria require oxygen in some form 
to grow and reproduce. 

9. Low temperatures are on the whole more 
destructive of bacteria than high ones. 

10. In the case of the halogens the germicidal 
power is directly proportional to their 
atomic weight. 


TEST II— RECOGNITION AND DIAGNOSIS OF 
BACTERIAL MICRO-ORGANISMS 

Directions, In Section One below are the names of 
fifteen organisms, ten of which are described in Section 
Two. Read each of the ten descriptions and write on 



284 


Objective Achievement Tests 


the line after it the name of the organism in Section 
One which most nearly fits the description. 


Section One 


1. Pneumococcus 

2. Bacillus anthracis 

3. Bacillus tetani 

4. Bacillus diphtheriae 

5. Bacillus subtilis 

6. Bacillus tuberculosis 

7. Bacillus coli 

communior 


8. Streptococcus 

hemolyticus 

9. Staphylococcus aureus 

10. Gonococcus 

11. Bacillus pestis 

12. Bacillus pyocyaneus 

13. Bacillus typhosus 

14. Meningococcus 

15. Spirochaeta pallida 


Section Two 

1. A non-motile, facultative anaerobic, non- 

liquefying, non-chromogenic, acid fast, 
parasitic, non-spore bearing, gymnobacte- 
rium, highly pathogenic for man, not readily 
stained by the usual aniline dyes; Gram 
positive; growing only on special media, 
exhibiting many involution forms in old 
cultures, but usually seen in fresh smears 
as a delicate, slender, slightly curved rod 
with rounded ends and beaded body 

2. A small, coffee-bean shaped, non-motile, 

facultative anaerobic, non-liquefying, non- 
chromogenic coccus, spontaneously patho- 
genic only for man,* staining readily with 
the usual aniline dyes; Gram negative; pro- 
duces no spores, has no capsules; can be 
grown only on special media; usually seen 
intracellularly in smears made directly from 
fresh pus 


TEST III— LABORATORY PROCEDURE 

This test consists of four procedures frequently per- 
formed by a bacteriologist. Each procedure can be ac- 
complished by five steps. The five steps are given, but 
not in the order in which they would actually be done. 



Objective Achievement Tests 285 

In addition to the five steps actually necessary sev- 
eral useless, additional steps are listed. Look over all 
the steps suggested for the procedure and number the 
steps from 1 to 6 that you would use in the order in 
which you would use them. 

Sample: Procedure for making a simple stain; 

3 steps necessary: 

1 Make a thin smear and fix it on a slide by 
passing through the flame. 

Decolorize with alcohol. 

Apply Gramms iodine solution. 

Wash off excess stain with water. 

„ _ A - - Apply methylene blue stain. 

Wash in chloroform for one minute. 

1. Procedure for making a Gram stain; 5 steps necessary: 

Put on Gramms iodine solution and wash off 

excess stain with water. 

Decolorize with 95 percent alcohol and wash 

off excess alcohol with water. 

Wash off excess chloroform with water. 

Apply 5 percent aqueous carbolic acid. 

Put on methylene blue. 

Wash in chloroform for one minute. 

Counterstain with safranin and wash off ex- 
cess stain with water. 

Make a thin smear and fix on slide by passing 

through the flame. 

Steam over the flame for three minutes. 

Apply aniline gentian violet and wash off ex- 
cess stain with water. 

TEST IV— IDENTIFICATION OF MICRO-ORGAN- 
ISMS FROM LANTERN SLIDES 

You will be shown ten slides, each showing the 
morphology and cultural characteristics of a micro- 
organism. You will be allowed one minute to study 
each slide and identify the micro-organisms. After the 



286 Objective Achievement Tests 

slide is removed you are to write the name of the micro- 
organism in the appropriate space below. 

1 6 

2 7 

3 8 

4 9 

5 10 


IV. Study and Standardization of the Tests 
in Bacteriology 

In the validation and standardization of the tests in 
bacteriology, the first step was to find a considerable 
group of persons whose abilities could be determined with 
reliability. The group used in the study of these tests 
included two classes in bacteriology taught during the 
school year of 1924r-5 (altogether 100 students). The 
painstaking cooperation of three instructors of these stu- 
dents made possible the establishing of a reliable cri- 
terion of their abilities against which to check the new 
test material. 

.The criterion used was a rating, established as follows: 
Each day during the course the class was divided and 
given a one-hour quiz by the professors; the professors 
alternated sections each day. Every student was called 
upon at each quiz and was rated according to his answers. 
These same professors also observed the men in the 
actual performance of the laboratory work, and in addi- 
tion to this they had given the men three written exami- 
nations previous to making their ratings. In that way 
the professors whose ratings were secured were enabled 
to know their men quite well. 

In estimating the abilities of the students the three 
judges each made independent estimates, using the 7- 
point scale described below. It was definitely under- 



Objective Achievement Tests 


287 


stood by each of the three that the estimates were to be 
based purely on the man’s knowledge of the subject and 
should in no case be influenced by his personal traits, 
study habits, or scholarship in other subjects. Each of 
the three judges agreed to the following standards of 
rating: 

The ratings should be based on an all-around esti- 
mate of the man’s knowledge of bacteriology. His 
personality and his habits (such as neatness and per- 
sistence) should not be taken into consideration in any 
way whatever in making up this rating. This judg- 
ment should be based solely upon the judge’s opinion 
of the man’s information about the subject, as mani- 
fested in his daily quizzes, his laboratory work, and his 
monthly examinations. 

The students are to be rated in seven groups. The 
4 or 5 very outstanding individuals are to be given a 
rating of 7, or exceptionally superior. The 9 or 10 
individuals who are noticeably above average are to be 
given a rating of 6. The 18 or 20 individuals who are 
slightly above average are to be given a rating of 6. 

The 30 or 35 individuals who are just average, fair, or 
ordinary are to be given a rating of 4. The 18 or 20 
individuals who are slightly below average are to be 
given a rating of 3. The 9 or 10 individuals who are 
noticeably below average are to be given a rating of 2. 
The 4 or 6 individuals who are exceptionally poor are 
to be given a rating of 1. 

After each of the three judges had independently pre- 
pared his estimates, the three held a conference to try 
to discover the reasons for differences. The cases where 
their estimates differed more than two points on the 7- 
point scale were brought up for special consideration in 
order to determine why their agreement was not closer. 
In practically all these cases one or the other of the 
judges had either overlooked something that the other 



288 Objective Achievement Tests 

judges knew, knew some trait that the other judges did 
not know, or was estimating on something that could 
not properly be considered as part of the student’s knowl- 
edge and information on the subject of bacteriology. As 
a result, each of the judges made some revisions in his 
estimates and in the end there was a much closer agree- 
ment than before, the final result being that in only a few 
instances were there more than two points of divergence 
between the ratings of any two judges for any individual. 

The revised ratings were then added and the average 
taken. The first 25 of these, with the bacteriology test 
scores, are given in Table XXI. 

Table XXI 

RECORDS FOR BACTERIOLOGY TEST STUDY 
Student’s Instructors’ Ratings Score on 


Number 

Judge A Judge B Judge C 

Average 

Bacteriology Test 

1 . 

7 

7 

7 

7 

193 

2 

7 

7 

7 

7 

191 

3 

5 

7 

7 

6.3 

173 

4 

6 

6 

6 

6 

176 

5 

6 

6 

6 

6 

175 

6 

6 

6 

6 

6 

173 

7 

6 

6 

6 

6 

172 

8 

6 

a 

6 

6 

172 

9 

5 

6 

7 

6 

171 

10 

6 

6 

6 

6 

166 

11 

6 

6 

6 

6 

166 

12 ... 

6 

7 

5 

6 

164 

13 

6 

7 

5 

6 

163 

14 . . . 

6 

6 

6 

6 

1‘63 

15 

5 

5 

7 

5.7 

181 

16 . . 

6 

6 

5 

5.7 

153 

17 

5 

5 

6 

5.3 

163 

18 

5 

5 

5 

5 

168 

19 

4 

5 

6 

5 

165 

20 

4 

5 

6 

5 

164 

21 

5 

6 

4 

5 

162 

22 

4 

5 

6 

5 

160 

23 

4 

6 

5 

5 

155 

24 

5 

5 

5 

5 

150 

25 

6 

5 

4 

5 

143 



Objective Achievement Tests 289 

The correlation coefficient between the criterion of bac- 
teriological knowledge and ability and the total test 
scores on the new type examination is plus .79, a much 
higher relationship than we can find between studies that 
have been made of validity of old type examinations. 
The reliability coefficient of the new type material is plus 
.90, as compared with the usually low reliability coeffi- 
cients for old type examinations. 

The study of these tests in bacteriology also included 
the establishment of tentative norms based upon those 
cases included in the validity study and additional cases 
tested in other schools. The norms as stated are given in 
Table XXII. 


Tabue XXII 

NORMS ON BACTERIOLOGY TEST 


10% made 167 or more 
20% made 158 or more 
30% made 148 or more 
40% made 142 or more 
50% made 137 or more 


60% made 132 or more 
70% made 126 or more 
80% made 117 or more 
90% made 101 or more 


The authors point out the following advantages of the 
new type test development: (1) It allows for the pos- 
sibility of making a wide sampling of information and 
judgment; (2) The giving and rating of the tests is 
characterized by ease, exactness, and quickness; (3) The 
tests allow of the possibility of standardization ; (4) Ad- 
ditional sets of tests of equal difficulty can be con- 
structed; (5) The abilities (in bacteriological success) to 
be measured are wide and varied, so that varied types of 
tests as measuring instruments seem suitable; (6) Those 
taking the new type tests found the material interesting 
and, as a test, fair and to the point. 



CHAPTER XIV 


The Measurement of Job Efficiencies 

S O FAR we have limited our consideration of measure- 
ment of achievement to the accomplishments of 
students in the schools. There is at least one other big 
group of individuals whose achievements must constantly 
be weighed in the balance. This group is made up of 
the employees of industry, business, and government. 
The significance of an accurate means of measuring their 
achievements (or efficiencies, to use the employer’s term) 
can hardly be overestimated. Such measurements are 
the logical bases for promotions and other rewards for 
good service; they are the starting point for adjustment 
procedures in cases of unadjusted workers; they deter- 
mine the order of lay-offs in time of unemployment ; they 
are the criterion against which employment methods are 
checked. 

In a small percentage of cases, job achievements or 
efficiencies can be quantitatively measured by a simple 
count of production. The performance of the factory 
worker who folds handkerchiefs all day can be measured 
by the number he folds; or the automobile salesman’s 
efficiency may be measured by the number of cars he sells. 
But the vast majority of workers are not performing 
duties which can be easily measured. It is for this large 
group that psychologists, employers, and personnel ad- 
ministrators have combined their efforts in attempting 
to devise means of accurately measuring job efficiencies. 

240 



Measurement of Job Efficiencies 


241 


Where direct quantitative measures of human perform- 
ances or human traits are not available, and where some 
measurement seems necessary, resort has invariably been 
had to some sort of estimation by superiors. These esti- 
mates have constituted the “efficiency ratings” or “serv- 
ice ratings” assigned to employees by superiors, foremen, 
or employers. Much of the contribution of psychology 
to measurement of job efficiencies has been in the im- 
provement of methods of assigning service ratings. 

We shall examine, as a sample, one study in the field 
of service ratings. (For a discussion of common types 
of rating scales the reader is referred to Chapter XIX.) 
We shall also discuss one example of the application of 
objective tests in arriving at achievement measures in a 
vocation that has commonly been the source of much 
trouble because of the subjectiveness and inaccuracy of 
assigned efficiency ratings. 

I. A Service Rating Scale 

About 1928, J. B. Probst began work on a service rating 
scale which might serve as a basis for accurate measure- 
ment of job efficiencies in a great variety of jobs. His 
scale was later extensively studied through a grant to the 
Bureau of Public Personnel Administration. In the re- 
port on this study, Mr. Telford, then director of the Bu- 
reau, took a very hopeful attitude toward the possibility of 
securing reliable measures of job achievements through 
use of the new rating scale. We quote from his fore- 
word to the report.^ 

Personnel administration as yet falls in the category 
of activities whose success or failure is not closely meas- 
urable. The esteem in which a personnel system is 


* Probst, J. B., Service Ratings, Technical Bulletin No. 4, Bureau of 
Public Personnel Administration, Chicago, p. 6. 



242 


Measurement of Job Efficiencies 


held depends only to a small degree upon the extent to 
which positions are classified with reference to the du- 
ties and responsibilities of their incumbents, the abso- 
lute and relative levels of pay for various kinds of work, 
the extent to which a competent or incompetent person- 
nel is secured and retained, and the methods used in 
fixing hours of work, checking attendance, granting an- 
nual and sick leaves of absence, and making demotions 
and removals. It is hardly too much to say that in 
most organizations, both large and small, both public 
and private, the esteem in which the personnel system 
is held depends not upon whether the bridge carries 
the load or falls into the stream, but upon whether those 
things which are undertaken are done with a certain 
degree of diplomacy so as to avoid giving offense to the 
persons involved. 

The need for a measuring instrument to indicate the 
degree of success or failure in personnel work has for a 
good many years been generally recognized. Time 
after time the attempt has been made to devise, in- 
stall, and operate a system of service ratings for this 
very purpose. Invariably, however, the results have 
been such that the most ardent advocates of these sys- 
tems have soon perceived their failure actually to 
measure performance. Neither the management, the 
administrative and supervisory officers, the employees, 
nor the central personnel agency have long believed 
that the so-called efficiency rating system really indi- 
cates efficiency or anything else of particular value. 

In view of the numerous unsuccessful attempts to de- 
vise and operate systems which really measure perform- 
ance on the job, either relatively or absolutely, a priori 
reasoning would lead to the conclusion that the system 
evolved by Mr. J. B. Probst, which is described in this 
book, would be neither much worse nor much better than 
its predecessors. Extensive experienced work with rigid 
statistical analyses of the results, as well as a different 
approach to the subject, has, however, led those con- 
nected with the Bureau of Public Personnel Adminis- 
tration and the Civil Service Assembly of the United 



Measurement of Job Efficiencies 


248 


States and Canada to the belief that Mr. Probst actu- 
ally has produced a measuring instrument which, while 
still not as accurate as those devices which determine the 
diameter of a bolt to the thousandth of an inch, is fairly 
comparable to the yard stick graduated into feet and 
inches. 

The service rating study resulted in the development 
of the Service Report Form and a scoring system for 
deriving from the superior’s reports a quantitative rating 
for each employee rated. On pages 244 and 245 are re- 
produced in reduced size the front and reverse sides of 
the report form. The general nature of the service rat- 
ing scheme and some of its characteristics as compared 
with less successful plans of rating may be indicated in 
terms of the aims set by the author. He states that * “an 
attempt was made to develop a system that would 
eliminate . . . particularly the following: 

(а) “Halo.” 

(б) Adjustments of ratings to harmonize high and low 
raters. 

(c) Inconsistency between reports or ratings made at 
different times by the same person. 

(d) Need for judging the various employees as to their 
relative excellence in each trait. 

(e) Necessity for special training of the reporting offi- 
cer. 

(/) Judgment ratings that admit of little chance for 
review. 

It was decided at the outset that the new system 
must provide, as far as possible: 

(a) That the employee’s performance be reported not 
merely in general conclusions but in statements of 
fact or specific and verifiable judgments. 

2 Ibid,, p. 22, 



244 


Measurement of Job Efficiencies 


(b) That the facts, traits, or qualities be stated on the 
report form in terms of the everyday thinking of 
the reporting officer, not in letters or percentages. 

(c) That the reported facts be properly interpreted and 
evaluated by a process which, in the very nature of 
things, must be developed through extensive ex- 
periments based on thoroughly sound principles. 



rOH A^RAISINQ AN CMALOVC t 
MNVICC VALUr 



rOA THC SIK MONTH MtNIOO 



INSTRUCTIONS 

I. On tfiii form you ora to report the aervu^ value of IIm employe mentioopil al>o\c The 
report ahould be for tlie ux-month period sliomn hereon, unicu otherwiMi luJicated 
3 In addition tothcblunhato bofilled monthisridcof the sheet, you ahoulJ check (with an X) 
all tiiute itcnii on the other side that you can find ivhiili u ill piopcriy fit or dexrrilio tlui cinplnv e 
Do not Btics'S. il you are not rtasonaldv avito that the cnipluve p(><u<csv^a the trait or n« diiy 
indiiatcd by a ciitain item, do nut chock that item at nil It is not neicssary to (.lied, nny 
purn luinilicr of items ^ou nny i>c able to rlierk 25 or more for one cmplnvo and li i\e 
didiciilfy in liiidiMK more than a do/en or so to iksrrilio pio|X!rly some other cniploio M il.o 
your X's small, keep them inside the htUo scjuiks. Do nut tiunge the vruidiiig uf any itiin. 
8, Tim sheet should lie cheeked inde(«cndenllv by three supervisory officers, whereier possible 
Fach officer should select one of the thr,.o ilieik columns in which to make Ins \ in iiks, .ind 
should keep all Ins marks within tint ssme eoliiinn on lioth sides of the slicil 'J ho officer 
who IS lowest in rank or authority should Ito the first to check the slicil, then Die nett 
liighcr (orcciuni) in rank should cliock, and the one id Inghcst authority should iheeic last, 
4, Some items, such as "Plcssinn and mellow voice,” "Active and strone," "Good he ninork in 
cnierKencics,'' nnd a f<w oihcm, should bo considered only if they are clc^lntd csM.iiiml or 
desirable for tlio particular position 


w 

For sickneu, with pay _ 

olayi 

(b) 

For sicknea, without pay . 

.days 

(0 

For personal reasons, anSw.nSMi p,r . 

days 

(d) 

How many days suspended, if any „ , 

days 


Cheek only one item in cacli of the followinit boxes In doing tins, you mnv consider not on’y the 
punctuality of the employe in reporting for work, but also in answering calls keeping appoint, 
menu, and handing in reports 


Check 


1 2 3 

□ □ □ 
n □ □ 

□ CD 

□ U D 

□ □ n 

Nearly alwaya late 

Usually late 

Often late (about half the tune} 

Usually punctual 

Never, or hardly ever, late 


□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 


Nearly always <|uiU ahead of time 

Usually quUa aliead of lima 

Often quits ahead of time 

Watchm clock too much near quitting ti 

Seldom quiU nhead of tune 

Never quiU ahead of time 


DO NOT CMteN ne 
TNieaoxroNTNt 
UNirOMMCO 
MUGC 041 rilRt 


Probst Service Report (Obverse Side) 





Measurement of Job Efficiencies 


245 


(d) That the report sheet permit the officer an optional 
selection of various traits and qualities, so that he 
may report on only those things with respect to 
which he has definite knowledge. 

(e) That the reporting officer be not required to meas- 
ure relative degrees of a quality in different em- 
ployees. 


DIRECnONSf PlflM ta X mark next to «ieh of tho iteoa on (hb pogo vhkh you know frooi your ova 
koowlodgo will doicnbo or 6t Ihiaemploya. Do not guMi; ehtok only if you nra raMonnbly ocrUia. 





1 a a 

□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
DUD 
□ □ □ 

□ □ a 

□ □ □ 

□ DO 

□ DO 

□ DO 

□ DO 
ODD 

□ □ □ 
ODD 

□ DO 

□ □ □ 
ODD 
□ □ □ 
n □ □ 
□ □ □ 
ODD 
□ □ □ 
ODD 
□ □ □ 
□ □ □ 

□ O □ 

□ □ □ 
□ □ □ 

□ □ n 
ODD 

□ □ D 

□ □ □ 

□ DO 

n o o 

□ on 

□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
ODD 
□ □ □ 

□ n n 

□ □ □ 

□ n □ 

□ on 

ODD 

ODD 

□ □ □ 
□ 0 □ 
□ □ □ 
□ □ □ 
□ □ □ 

□ on 

ODD 

□ □ □ 
□ □ □ 
□ □ □ 
ODD 
□ □ □ 

Luf 

Blow moving 

Quick and active 

Too old (or the work 

Minor phyrical defecta 

Serious physical dclocts 
iDdiffon-nt 

Talks too much 

Too blunt or ouUpokra 

Too much self-iffi^rtanoe 

Good team worker 

Not a good team worker 

Resents criticism or sunmstions 

Antagonises when dealing aith otheit 

Might often be more considerate 

Usually pleasant and cheerful 

Always courteous 

Cranky chspoeiuon 

Often eeems dusatisfied 

Often grumbling or eomplelnlng 

Usee poor judgment 

Might often ute better ludgmeot 

Gcnernlly uene good iiiagmeot 

Alweyi uees go^ Juagment 

Doee not do his (her) sbaie of work 

Gencra’'y looks for the easy work 

Must gciicmily be told whto to do 

Woik often sligiiUy behind 

Often needs pmdiog 

Work always up to ckte 

Turns out unusually Isrgs amount of work 
Steady worker most of the tame 

Always busy at work 

Doss not accept responsibiUty 

Accepts rerponsibility 
> Does not always obey orders wflUagly 

Visits too much with otbers 

Needs oonsidrralile supemifoa 

Works woU without supervision 

Fine self.coDtrul, seldom loses tompsr 

Loses temper easily 

Easily rattled 

Lacks self-oonSdsnes 

Too easy-going 

Learns new work elowly 

Lesms new work easily 

Understnnda instruetMne rasdily 

A willing worker at nil times 

Takes unusud interest in tbs work 

Might be more orderly 

Very orderly and systsmsUe 

Often forgetful 

Often does csrelees work 

Makes many mistakss 

Usually accurals 

Hardly ever luokea.n mbtaka 

Accurate but vcryTleliberate 

Is highly expert in own work 

Not geamliy rnU^iie or dependable 

Usudly retiabla and depio£ble 

Always niinbis and depeadabls 


Probst Service Report (Reverse Side) 





246 Measurement of Job Efficiencies 


(/) That the rating and scoring system make it unnec- 
essary to “adjust” the resulting scores or ratings. 

(g) That the reporting officer be virtually forced to 
report accurately or be shown by internal evi- 
dence in his own reports not to have done so. 

(h) That the results of the scoring system be suffi- 
ciently simple for the employees to understand, so 
that they themselves may easily determine in a 
general way the fairness and reliability of the dis- 
tribution of ratings. 

(i) That the scheme also take into consideration the 
ordinary mental processes of the reporting officer — 
his reluctance generally to rate negative qualities, 
his normal desire to say good things about an em- 
ployee, his tendency to use superlatives in describ- 
ing the favorite employee, and the like. 

The derivation of a rating from the checked report 
sheet is based on numerical values assigned to the various 
items in the report. The values are positive for some 
items and negative for others. The first part of the 
scoring process is to count the number of favorable, or 
credit, items checked on the report; the resulting total 
is termed the X score. Then the sum of all the values 
assigned to all the checked items, favorable or unfavor- 
able, is obtained; that sum constitutes the Y score. 
Numerous experiments were conducted to establish a 
means of translating these X and Y scores into reliable 
and valid ratings, meeting all statistical and technical 
requirements. Finally a rather complex formula was 
evolved, and ratings obtained with it have been found to 
reflect with considerable accuracy the worth of employ- 
ees rated. The X and Y scores are first translated into a 
numerical rating and can subsequently be expressed in 
letter ratings if desired. The scoring has been reduced 
to a purely mechanical procedure through the use of 
special stencils and scale rules. The reader is referred 



Measurement of Job Efficiencies 


247 


to the author’s report for details regarding the scoring. 
Probst emphasizes in his report that the officers or super- 
visors making out the report on an employee are not 
“rating” him — they are merely indicating certain facts 
(sometimes approximating judgments) about him — ^and 
that the ratings are obtained from the report by a system 
of evaluating the facts and judgments reported in ac- 
cordance with a procedure evolved by extensive experi- 
mentation. This the author believes to be one of the 
main advantages of his rating method as compared with 
those commonly used methods in which the reporting 
officer attempts to assign relative ratings or values to the 
various traits or qualities he considers in employees. 

General value for a system of rating efficiency of 
employees can be assumed if (1) the system gives a dis- 
tribution of ratings approximating the normal distribu- 
tion curve; (2) the ratings are reliable as shown by 
agreement between separate ratings made on the same 
group of employees; and (3) ratings are valid as shown 



miiiiin 


E-E 0-D 0-CC+B e+A 
Fig. 17.— -Distribution of Probst Service Ratings. 




248 Measurement of Job EflBciencies 

by agreement with reliable criteria of employees’ effi- 
ciencies. 

The distribution of Probst Service Ratings. The dis- 
tribution of ratings for employees in three cities is shown 
by the vertical bars in Fig. 17. The form of the distribu- 
tion closely approximates that of the normal distribution 
curve. A great many rating scales for evaluating em- 
ployees’ achievements have given very little distribution 
of the ratings, probably because of the tendency of 
raters to bunch all the ratings at the high end of the 
scale. When we know that all the employees in a 
group are not of equal efficiency, a rating system is 



Fig. 18^-Chart Showing Reliability of Probat Service Ratings. 




Measurement of Job Efficiencies 249 

certainly ineffectual if it does not differentiate among 
those being rated. In view of this common defect of 
rating systems, it is encouraging to find distributions 
such as the three illustrated. 

The reliability of Probst Service Ratings. Reliability 
of a rating system is indicated when it produces com- 
parable results in different trials under similar conditions. 
Probst reports a correlation of .78 between two successive 
semiannual ratings for a group of 475 municipal em- 
ployees in the City Water and Power Department of 
Los Angeles. Fig. 18 shows graphically the relationship. 
An analysis shows that 155, or 32i/^ per cent, showed no 
change in their ratings from one period to the next. Of 
the remaining number, 217 showed a change of one step, 
75 a change of two steps, 26 a change of three steps, and 
2 a change of four steps. The changes of one step can 
be regarded as practically negligible. Such a degree of 
reliability as shown by the ratings of this group approxi- 
mates the reliabilities obtained by objective psychological 
tests, and can certainly be considered high enough to 
recommend the use of such a rating system as compared 
with any system of personal estimates unaided by a 
rating device or system. 

The validity of Probst Service Ratings. If the rat- 
ings are valid, the employee who is rated A and the em- 
ployee who is rated E must in actual fact be A and E 
employees, respectively. Such validity is the very es- 
sence of the value of the rating system, and yet, unfor- 
tunately, it is the most difficult factor to prove. The 
difficulty rests upon the fact that there are no reliable 
criteria in most instances of the actual value of the em- 
ployee with which to compare the ratings. Probst re- 
ports numerous studies of validity of the ratings in which 
Probst Service Ratings have been compared with various 



250 


Measurement of Job Efficiencies 


supervisor, foreman, and employer estimates of employee 
worth. His groups include studies of office workers, 
laborers, nurses, policemen, firemen, teachers, and others. 
The large majority of his correlations are above . 60 , in- 
dicating considerable promise for his rating system, es- 
pecially in the face of a somewhat unknown reliability of 
the criteria. 

The foregoing material has been presented, first, as an 
example of a rating scale and rating system applicable to 
measurement of employee achievement; and second, as 
as example of about the most that can be expected from 
rating systems in their present stage of development. 
The reader is warned that with less carefully worked out 
rating systems, with poorer cooperation on the part of rat- 
ing officers and supervisors, and perhaps in some occupa- 
tions, results comparable with those obtained by Probst 
cannot be expected. 

II. Objective Psychological Tests in the 
Measurement of Job Efficiencies 

For a few specific jobs, objective “psychological” type 
efficiency tests have been devised and standardized. 
There are, for example, standardized typewriting tests, 
shorthand tests, bookkeeping tests, teaching efficiency 
tests, and trade tests in a few of the mechanical trades. 
These presumably can be administered at periodic inter- 
vals to employees on the job and be used as a basis for 
evaluating their job efficiency. Besides being available 
for only a very limited number of positions, they often 
fall short of satisfactory measurements in two respects: 
they often do not test all the aspects of job success; and 
they often represent artificial test situations somewhat 
removed in nature from the actual job itself. However, 



Measurement of Job Efficiencies 251 

supplemented by other considerations and measurements, 
these tests are valuable tools. 

An interesting use of “psychological” tests has fre- 
quently been suggested and in a few instances carried 
out as a basis of measuring teaching efficiency. In 1928- 
29, the writer cooperated in such a study of teaching 
efficiency. Let us look very briefly into the nature of 
this study. 

The study took as its starting point the premise that 
the important purpose of teaching is to make desirable 
changes in the students, these changes being indicated in 
the amount the students learn. The purpose of teaching 
being thus defined, the teaching efficiency of any in- 
structor may be considered to vary directly with the 
desirable changes produced in the students by his teach- 
ing. To arrive at a fair measure of student improve- 
ment, it would seem necessary, then, to know three 
things; the ability or natural capacity of the students to 
learn; what they know about the subject before the in- 
struction ; and what they know at the end of the instruc- 
tion. 

Twenty colleges agreed to cooperate in an experimental 
rating, on such a basis, of efficiency in teaching general 
college chemistry. The same standardized test covering 
general college chemistry was given at the end of the 
year’s work to all the students of the instructors included 
in the study (6,667 students). Could it have been as- 
sumed that all the instructors were teaching students of 
the same ability and the same chemistry knowledge and 
training prior to their taking the course, the average test 
performances of their students after a year’s instruction 
might be considered indicative of their relative teaching 
efficiency. Since equalities in ability and previous train- 
ing could not be assumed for the various classes, these 



252 Measurement of Job Efficiencies 

two factors were studied and the raw chemistry-test 
averages were corrected, where necessary, to equalize 
ability and previous chemistry knowledge of students of 
the different instructors. This was done by an adjust- 
ment of scores for those students who had already studied 
chemistry in high school in accordance with the amount 
of increase in final score such previous training was found 
to produce. Corrections for differences in average ability 
of different classes of students were made on the basis of 
statistical analysis, which showed that a change of one 
point in intelligence (as measured by the test used) 
affected the chemistry score by .32; so that, if an instruc- 
tor’s class was 1.0 point above the average for all in- 

• Table XXIII 

COMPARISON OF EFFICIENCY OF INSTRUCTORS » 






■§1 

1 

1 

W /-S 


1 § 




Years* 

Experience 





1 

1 

qj 09 

Rank of 
Instructor 

.11 
§ s 

Intelligenc 
Median of 

K qj 

• S 
^ o 

K U 
^ K 

Chemistry 
{Corrected 
high schoo 

Chemistry 
{Corrected 
h, s, & ini 

A 

M.S. 

Instructor 

2 

14.5 

110.4 

84.0 

756 

7257 

B 

Ph,D 

Asst. Prof. 

5 

14 

105.6 

81.5 

76.25 

7465 

C 

BB. 

Instructor 

2 

10 

127.3 

105 

88 

7966 

D 

M.A. 

Instructor 

1 

12.5 

118.8 

77.5 

63.83 

57.91 

E 

M.S. 

Instructor 

5 

21 

107.7 

65 

57 

54.63 

F 

B.S. 

Instructor 

3 

20 

1132 

79 

68 

6367 

G 

BA. 

Instructor 

4 

18 

107.6 

84.5 

766 

74.16 

H 

B.S. 

Instructor 

15 

21 

104.9 

75 

64 

62.53 

I 

M.S. 

Instructor 

3 

21 

117.4 

83.5 

7756 

71.78 

J 

Ph.D. 

Assoc. Prof. 

14 

15 

1012 

66.75 

59.5 

59.18 

K 

BB. 

Instructor 

7 

18 

80 

69 

68 

74.5 

L 

M.A. 

Instructor 

2 

21 

115.7 

91 

80 

75.07 


Averages 



109.6 

78.04 

71.07 

68.09 


» Moss, F. A., Loman, Wm., and Hunt, Thelma, “Impersonal Measure- 
ment of Teaching,” The Edwatiorud Record, Vol. 10, No. 1, January 
1929, p. 47. 



Measurement of Job Efficiencies 


253 


structors’ classes in intelligence, he would have .32 added 
to his chemistry average. The data obtained for twelve 
instructors aU teaching general chemistry classes in one 
college are shown in Table XXIII. Similar records were 
available for all the instructors studied. The last column 
of the table contains the final figures representing relative 
efficiency of the instructors. By reference to this col- 
umn, it can be seen that the variations in efficiency of 
instruction even within the same institution are large. 
Among all the instructors in all the colleges included in 
the study, the efficiencies as indicated by corrected me- 
dian scores on the chemistry test range from 37 to 79. 

It may be said that other factors not taken into ac- 
count in this study affect the teaching efficiency, and so 
lower the reliability and validity of estimates based upon 
student test performances. Studies done in conjunction 
with the investigation just described showed relatively 
little effect, however, from such factors as departmental 
organization, size of institution, size of class, and teaching 
load of the instructor. While this method of measuring 
teaching efficiency may not be as easily applicable to aU 
types of classes as it is to chemistry, it certainly suggests 
principles along which efforts may be made to measure 
efficiency of instruction in those courses of study for 
which the content is fairly well standardized. It is a 
method of demonstrating results of teaching objectively; 
and one which would remove rating of teacher efficiency 
from the realm of subjective personal judgment. 




Part V 

MEASUREMENT IN INDUSTRIAL AND 
PERSONNEL FIELDS 




CHAPTER XV 


Historical Background of Psychological 
Measurement in Industry 

P SYCHOLOGICAL measurements in business and in- 
dustry are those measm-ements which are applied 
to the human element and to the processes directly de- 
pendent upon the human element. In the early days of 
large-scale business and industry, psychological measure- 
ments found relatively little application, because of the 
lack of emphasis upon man as a part of the production 
process. The early industrialists thought of the machine 
as the all-important element in business. The introduc- 
tion of psychology and psychological measurements into 
industry represents an admission that the machine is 
insufficient in itself to meet the demands of industry. 
The study and measurement of the human element marks 
a recognition that machines can be used to advantage 
only through properly selected and adequately trained 
men. 

I. The Foundations of Psychology Applied to 
Personnel Problems 

In his discussion of the historical background of in- 
dustrial psychology, Viteles points out that three distinct 
forces have played a part. These he refers to as the 
economic, social and psychological foundations of indus- 
trial psychology. Since industrial psychology is largely 
the application of quantitative methods to the study of 

257 



258 History of Measurement in Industry 

human traits and abilities and working processes, we 
may consider these three forces as the fundamental ones 
leading to the application of psychological measurement 
to the problems of personnel. The economic foundation 
is the aim of business and industry to attain maximum 
production at minimum cost. It was to be expected that 
whatever might contribute to the attainment of this 
aim would be accepted by industry. As a matter of fact, 
industry has accepted psychology and psychological 
measurement as aids largely because of what they can 
contribute toward cheaper production and greater profits. 
One of the most important reasons why industry is' will- 
ing to introduce measurements of ability, of aptitudes, 
and of personality as a means of selecting employees, is 
that employees selected on such a basis are worth more 
to them than employees selected on other bases. 

Antedating somewhat the specific applications of psy- 
chology among workers were a number of early studies 
which directed attention toward the human element in 
industrial management and demonstrated that such con- 
sideration was economical. These included the studies 
in scientific management conducted by such workers as 
Frederick W. Taylor and Frank B. Gilbreth. Taylor, 
toward the. end of the nineteenth century, started a sys- 
tem of scientific management which was the first to 
emphasize the human factor in production. He became 
the pioneer of a movement which spread throughout the 
entire world of industry under the name of Taylorism. 
His system was based upon two assumptions: ^ (1) what 
the workmen want from their employers beyond any- 
thing else is high wages, and what the employers want 
from their workmen is a low labor cost of manufacture; 

1 Taylor, F. W., The Principles of Scientific Management, New York, 
1911, p. 42. 



History of Measurement in Industry 259 

and (2) no system or scheme of management should be 
considered which does not in the long run give satisfac- 
tion to both employer and employee, which does not 
make it apparent that their best interests are mutual, 
and which does not bring about such thorough and hearty 
cooperation that they can pull together instead of apart. 
Taylor made an early application of his principles in his 
classic experiment in selecting and training men to handle 
pig iron. This experiment has been well summarized by 
Viteles: * 

The work was carried on in the plant of the Bethle- 
hem Steel Company. At the time of the experiment 
there were in operation five blast furnaces, the product 
of which had been handled by a pig-iron gang for many 
years. This gang, at the time Taylor started his work, 
consisted of about 75 men. ^They were good, average 
pig-iron handlers, were under an excellent foreman who 
himself had been a pig-iron handler, and the work was 
done, on the whole, about as fast and as cheaply as it 
was anywhere else at that time.” The work of han- 
dling pig-iron was done by men with no other imple- 
ments than their hands. The pig-iron handler stooped 
down, picked up a pig weighing about 92 pounds, 
walked for a few feet or yards, and then dropped it onto 
the ground, or upon a pile. Taylor believed this 
work to be so crude and elementary in its nature that 
it would be possible to train an intelligent gorilla to be- 
come a more efficient pig-iron handler than any man can 
be. And yet he felt that the science of handling pig- 
iron, in spite of the crude character of the work, is so 
great, that this type of work could be used to illustrate 
the accomplishment to be derived from properly train- 
ing competent workers in the best methods of work. 

When Taylor started to apply his principles, he found 


* Viteles, Morris S., Indmtrial Psychology, W. W. Norton & Co., Inc., 
New York, p. 10. 



260 History of Measurement in Industry 

that the gang of laborers employed in loading pig-iron 
onto a railroad car were averaging approximately 12^ 
long tons per man per day. After carefully observing 
the methods of work and studying the number of volim- 
tary pauses, etc., he reached the conclusion that a first 
class pig-iron handler ought to handle between 47 and 
48 tons per day instead of 12% tons. Taylor ques- 
tioned many good managers and asked them whether, 
under premium work, piece work, or any of the ordi- 
nary plans of management, they would be likely ever 
to approximate 47 tons per day. Not a man suggested 
that an output of over 18 to 25 tons could be obtained 
by any of the ordinary expedients. 

Taylor then set about to choose a worker on whom he 
could first try out his new methods. He finally selected 
a Pennsylvania Dutchman whose reputation, habits, 
and ambition made him seem a likely subject. This 
man was asked whether he would prefer earning $1.85 
per day to the $1.15 which constituted his pay check at 
the time. He was told he could do so by loading in one 
day a pile of pig-iron (consisting of 47% tons) which 
was pointed out to him. He was further cautioned that 
in order to load this pile of pig-iron and to earn his in- 
creased pay, he must carefully follow the instructions 
of the man assigned to train him in the proper method 
of doing his work. He started to work, and all day 
long, and at regular intervals, he was told by the man 
who stood over him with a watch ; “Now pick up a pig 
and walk. Now sit down and rest. Now walk — now 
rest,” etc. He worked when he was told to work, and 
rested when he was told to rest, and at half past five 
in the afternoon had his 47% tons loaded on the car. 
He practically never failed to work at this pace and to 
do the task that was set him during the three years of 
observation by Taylor. And throughout this time he 
averaged a little more than $1.85 per day, whereas be- 
fore he had never received over $1.15 per day, which 
was the ruling rate of wages at that time in Bethlehem. 

One man after another was picked out and trained to 
handle pig-iron at the rate of 47% tons per day until all 



History of Measurement in Industry 261 

the pig-iron was handled at this rate, and all of the 
men still working in the gang received more wages 
than those paid to other men around them who were 
not employed on task work. However, of the gang of 
“seventy-five pig-iron handlers,” only about one man 
in eight was found physically capable of handling 471/2 
tons per day. With the very best of intentions, the 
other seven out of eight men were physically unable to 
work at this pace. 

Such experiments as the one just quoted are excellent 
examples of the demonstrations to industry that atten- 
tion to the human element is economically profitable. 
While Taylor and other industrial engineers who were 
interested in his principles contributed very little to the 
theory and procedure of psychology as such, their em- 
phasis on the human factor was extremely important in 
preparing the ground for later application of psychology. 

The social foundations of industrial psychology are 
rooted in development of an attitude of concern for the 
well-being of the worker. At the beginning of the in- 
dustrial era, the worker was looked upon largely as a 
commodity with a certain commodity value, little 
thought being given to his individual happiness and well- 
being. During the nineteenth century, when develop- 
ment in industry was proceeding so rapidly, the necessity 
of protecting and conserving the human element was 
rarely recognized. Attention was given to the conserva- 
tion of capital, machinery, and raw materials, but 
waste of human life, the devastating effects of fatigue, 
and destruction of health in industry received very little 
attention. Recent years have brought a considerable 
change in this viewpoint and today individual welfare in 
industry is acknowledged to be an important concern of 
society. The concern is reflected in all sorts of social 
welfare measures, compensation for accidents, pension 



262 History of Measurement in Industry 

systems, health protection, etc. This attention to the 
individual welfare of the worker has' finally culminated 
in the development in large organizations of separate per- 
sonnel departments, usually manned with individuals 
trained in psychology and equipped with the various in- 
struments for studying and measuring aspects of human 
abilities and skills. 

The psychological foundations of industrial and per- 
sonnel psychology are rooted in the development of psy- 
chological techniques of aid in dealing with the worker 
or employee. The introduction of experimental and 
quantitative methods into psychology marks the begin- 
ning of the development of techniques which can be util- 
ized in solving the problems of the worker. The study 
of individual differences has also been fundamental to the 
development of industrial psychology. In fact, indus- 
trial psychology is concerned with the individual — in his 
reactions in a specific vocational situation. More re- 
cently, the development of the various types of psycholog- 
ical tests has contributed immensely to the solution of 
problems of employment. 

II. Early Experiments in Industrial Psychology 

The earliest applications of quantitative methods of 
studying the worker were scattered experiments done in 
psychological laboratories. Some of the early investi- 
gators became interested in problems of fatigue among 
workers and conducted various types of studies in which 
attempts were made to measure this fatigue. Others of 
the early investigators were interested in studies related 
to the training of workers. Of significance are the in- 
vestigations of Bryan and Harter on the learning of 
telegraphy, and of Book on learning to typewrite. As 
early as 1911, Walter Dill Scott published a monograph 



History of Measurement in Industry 263 

in which he discussed the applications of psychological 
devices in recruiting employees and in increasing the 
quantity and quality of their work. 

While many of these early experiments had a relation- 
ship to the application of psychology in industry, no sys- 
tematic attempt to apply psychological tests or measure- 
ments to a practical industrial problem was undertaken 
before the pioneer work of Miinsterberg in 1912. At 
that time he undertook an experiment in scientific selec- 
tion of street-car motormen at the suggestion of and with 
the cooperation of the American Association for Labor 
Legislation. Those interested in the experiment were 
concerned mainly with the problems of safety and acci- 
dent prevention. They recognized the importance of the 
human element in the causation of accidents and were 
therefore willing to sanction a psychological study of 
the problem. Miinsterberg undertook to analyze the 
human qualities important in accident prevention and to 
devise tests for measuring the qualities. After careful 
study, Miinsterberg stated that one of the most necessary 
qualities in the safe operation of the street car was “a 
particular complicated act of attention by which the 
manifoldness of objects, the pedestrians, the carriages, 
and the automobiles are observed with reference to rapid- 
ity and direction in the quickly changing panorama of 
the street.” He emphasized as of extreme importance 
“the ability to keep attention constant, to resist distrac- 
tion by chance happenings on the street, and especially 
the always needed ability to foresee the possible move- 
ments of pedestrians and vehicles.” 

For these qualities, Miinsterberg devised a test consist- 
ing of a series of cards representing a street with the 
various conditions which might exist with reference to- 
pedestrians and vehicles. The arrangement of a sample 



264 History of Measurement in Industry 

card is shown in Fig. 19. The heavy lines represent 
street car tracks. The space on either side of the track 
is divided into 64 units, represented by small squares. 
The “1” digits represent pedestrians, who move just one 
step, from one unit to the next. The “2” digits represent 



■ 

■ 


□ 

■ 

■ 

■ 



■ 

■ 

3 

B 

D 

■ 




3 

D 

2 

C 

n 

■ 

2 



■ 

2 

■ 

D 

2 

■ 

3 



2 

■ 

D 

E 

■ 

D 

1 



3 

■ 

□ 

F 

D 

3 

■ 

2 


■ 

2 

■ 

G 

2 


2 

■ 


3 

■ 

B 

H 

■ 

2 

■ 

3 


Fig. 19.^ — Sample Card from Munsterberg’s Test for Street-Car 
Motormen. Large, heavy figures represent red digits. 


horses, which move twice as fast, or two units. The “3” 
digits are automobiles, moving three units. Black digits 
move parallel to the track and are, therefore, not sources 
of danger in causing street car accidents. Red digits 
stand for pedestrians or vehicles which move toward the 
track and are potential sources of danger. Those red digits 































History of Measurement in Industry 265 

which would land in the track in taking the number of 
steps indicated are considered dangerous ones, and in 
taking the test the subject must pick out as quickly as 
possible those points on the tracks that are so threatened. 
As the test is taken, the cards are exposed by a moving 
belt with a window that exposes the whole width of the 
card and an area of 5 units’ length. The subject turns 
a crank moving the belt as he takes the test. The score 
on the test is determined by errors and time taken to ob- 
serve. Miinsterberg gave the test to a group of motor- 
men of a street railway company and found considerable 
relationship between scores made and accident records. 

This pioneer experiment of Miinsterberg was the start- 
ing point for a number of trials of his test, for the develop- 
ment of a number of similar tests, and, in its far-reaching 
effects, for the beginning of a widespread cooperation be- 
tween the science of psychology and industry in the solu- 
tion of industrial problems. 

III. The Effect of the World War 

The development of psychological measurements for re- 
cruits in the United States Army during the World War 
represented the largest-scale experiment in the application 
of psychological methods to an industrial problem that 
we have ever had in this country. The work of develop- 
ing the mental tests is described elsewhere in this book.® 
In addition to the well-known mental tests, investigations 
were conducted in the development of specialized tests 
for aviation and for other more technical occupations. 
There were developed not only measures of general capac- 
ity, but a series of trade tests for measuring the skill of 
men being considered for assignment to various special 


® See Chapters III and IV. 



266 History of Measurement in Industry 

skilled trades, such as electrical work. These various de- 
velopments in the Army were the starting point for many 
experiments on scientific selection for various occupations 
which were carried on after the war in civilian life. In 
fact, the war testing probably gave the impetus to the 
rapid rise of selection methods, for workers, based upon 
the testing of capacities and skills. 

IV. History of the Testing of Applicants 

Since the greatest, or at least the most extensively used, 
contribution of psychology to personnel problems has 
been in the field of employment examinations, our his- 
torical sketch might well include a view of the testing of 
applicants. Testing by examinations on an extensive 
scale began in the public service in connection with the 
selection of government employees. The setting up of 
examinations in the public service dates generally from 
the establishment of the United States Civil Service 
Commission. This Commission was established by law 
in 1883 to select employees on a basis of merit, to sup- 
plant the old spoils system under which employees held 
their jobs largely through political influences. 

Since the founding of the Civil Service Commission 
there has been a continuous evolution of the system of 
testing applicants, starting with the borrowing of purely 
academic examinations from the schools, and culminating 
in the extensive work on psychological tests carried on 
by the Research Division of the Civil Service Commis- 
sion. The stages of the evolution may be represented by 
the following types of tests: (1) examinations in terms 
of school subjects; (2) essays in terms of the job; (3) 
general intelligence tests; (4) special aptitude tests in 
terms of the job; (5) achievement or trade tests in ob- 
jective or short-answer form. 



History of Measurement in Industry 267 

1. Examinations in terms of school subjects. When 
the Civil Service Commission first faced the problem of 
selecting employees on the basis of merit, practically 
nothing had been done in the development of examina- 
tion methods which might indicate degrees of merit. 
The only types of examinations which the Civil Service 
Commissioners found available were the academic ex- 
aminations used by educators as a basis for judging per- 
formance and merit in school work. No other examining 
method being available, those given the job of selecting 
government personnel under the new Civil Service Act 
began using these academic tests for all kinds of posi- 
tions. The tests usually consisted of parts covering such 
subjects as arithmetic, spelling, composition ability, pen- 
manship, and grammar. The tests varied from one job 
to another only in the difficulty of the questions. The 
same types of academic tests were used to fill such diverse 
positions as clerk, librarian, matron, patrolman, and mes- 
senger boy. 

As we view this type of examination today, the only 
merit which it might possess as a basis for selecting such 
diversified employees would be in its selective power 
with reference to general ability or general mentality. 
Presumably, those who used the examinations might 
have believed that high test attainments were indicative 
of high general ability. However, numerous experimen- 
tal studies have demonstrated that there is relatively low 
relationship between one’s penmanship ability and one’s 
mental alertness, or between one’s spelling ability and 
one’s mental alertness. About the only thing that we 
can say in favor of the purely academic tests is that they 
were undoubtedly better than selections based purely 
upon personal opinion gained from an interview or some 
other similar method. 



268 History of Measurement in Industry 

2. Essays in terms of the job. This stage in the evo- 
lution of methods of examining applicants grew immedi- 
ately out of the objection to the academic tests on the 
score that they were unrelated to the jobs to be filled. 
In the first attempts to overcome this objection, we find 
Civil Service examiners using the same examination 
techniques but asking questions having some bearing on 
the job to be filled. If a composition was to be written, 
the subjects suggested to the applicant related specifi- 
cally to the job, instead of being of a general nature. In 
this stage of development, the examinations often in- 
cluded a specific part on the job ; if the job was that of 
plumber, some of the questions asked related specifically 
to plumbing. But since this stage antedated the devel- 
opment of objective, short-answer types of tests, the 
questions were of a discussion or essay type. 

The chief advantage of this improvement in examina- 
tion methods lies in the more specific bearing of the ques- 
tions on the job, which made them appeal to competitors 
and employers as being practical. Disadvantages were 
chiefly those which we pointed out in Chapter XII as 
existing in the case of essay tests as compared with more 
objective test methods. This method of selecting em- 
ployees combined with the purely academic tests contin- 
ued for several years and still exists in many instances. 
Practically no other types of tests appeared before the 
development of psychological test methods during the 
World War. 

3. General intelligence tests. The advisability of 
using general intelligence tests for determining mental 
alertness having been demonstrated by the experience in 
the Army during the World War, various civil service 
commissions and private industrial groups quickly saw 
the possibilities of utilizing such tests in the selection of 



History of Measurement in Industry 269 

employees. At the beginning most of the testing was 
done by using the actual tests which had been developed 
in the Army. Later on, with the realization that tests 
could be developed better adapted to the specific prob- 
lems at hand, general intelligence tests designed particu- 
larly for occupational use were more frequently worked 
out. There are scattered instances of such developments 
in private industrial groups. With the establishment of 
the Research Division of the United States Civil Service 
Commission in 1922, there began the development of a 
series of mental alertness tests for use in selecting public 
personnel. For a discussion of the advantages and limi- 
tations of general intelligence testing in the selection of 
employees, the reader is referred to Chapter VII. 

4. Special aptitude tests in terms of the job. The use 
of general intelligence tests was a distinct advance over 
the old academic types of tests for selecting personnel, 
but such tests possessed certain disadvantages which led 
in a short time to the development of specialized types 
of tests. The intelligence tests being general in nature, 
their terminology often seemed rather “far-fetched” in 
relation to the occupation for which they were used, even 
though they did test a quality important in the job. 
This made them in many instances seem abstract and 
impractical to both applicant and employer. It also 
proved to be true that for a considerable number of 
jobs the relationship between abstract intelligence and 
success on the job was not high enough to justify putting 
complete dependence on the general test for the selection 
of employees. Such facts as these led to the construction 
of tests patterned largely after the general intelligence 
tests but more specifically applying to the position for 
wjiich they were to be used. In some instances these 
special tests are nothing more than specialized intelli- 



270 History of Measurement in Industry 

gence tests in the terminology of the job. In other in- 
stances, they measure certain additional qualities not 
measured in the general mental alertness test. Special- 
ized tests have the advantages of appealing to adminis- 
trators as being practical, of appealing to the applicant 
as being a fair test of the ability required of him, and of 
having generally a better selective value or validity than 
the abstract intelligence tests. 

An example of one of these specialized intelligence 
tests is one constructed by Telford and Moss for selecting 
policemen. This particular test has been used exten- 
sively by state and city Civil Service Commissions. The 
whole examination includes the following parts: 

(1) Observation. Measured by presenting the appli- 
cant with a picture of a collision between a street car and 
an automobile, requiring him to study it for a limited 
time, and asking him later to answer several questions 
without looking at the picture. 

(2) Memory. Measured by requiring the applicant 
to pick from a large number of photographs faces that he 
has seen before. 

(3) Comprehension. Measured by the applicant’s 
ability to answer questions based on printed selections 
from laws, ordinances, and police regulations. 

(4) Judgment. Measured by ability to answer such 
questions as the following,* in which the applicant has to 
select the correct one of four suggested solutions: 

If a policeman considers himself unfairly treated by 

his Sergeant, and gets no satisfaction when he explains 

to the Sergeant that he is not treated fairly, he should: 

Refuse to obey any orders given by the Sergeant. 

♦Telford, Fred, and Moss, F. A., "Suggested Tests for Patrolmen," 
Public Penonnel Studies, 1024, Vol. 11, p. 112. 



History of Measurement in Industry 271 

Invite the Sergeant to meet him when both are 

off duty so that they can settle the matter them- 
selves. 

At the first opportunity report the matter to his 

Lieutenant or Captain. 

Immediately hand in his resignation. 

It will be noted that this test as compared with a gen- 
eral intelligence test stresses those things which the ap- 
plicant might be expected to do after he assumes his job. 
His observation is tested in terms of a type of situation 
which he may actually be called upon to observe and re- 
port; his memory is tested in terms of faces, a type of 
memory which the policeman must often utilize in the 
recognition of persons to be apprehended ; and his com- 
prehension ability is tested in terms of material related 
to the job. 

5. Achievement or trade tests in objective form. 
These tests have been developed for positions in which 
high intelligence or special aptitude alone is not suf- 
ficient, but in which the applicant must have specific 
information and technical training along the line of the 
job itself. The achievement tests are designed to measure 
acquired information or knowledge about the job or 
technical skill. They are an outgrowth of the essay tests 
on the job, coming with the demonstration that objective 
methods have many advantages over essay methods in 
the measurement of achievement. 

Short-answer tests which have been developed for use 
in selecting chemists, bacteriologists, hospital workers, 
etc., are illustrations of this type of examination. The 
trade tests developed during the World War for measur- 
ing knowledge of mechanical and electrical work are also 
examples. These tests are primarily achievement tests 
and, in industry and personnel work, serve the purposes 



272 History of Measurement in Industry 

that educational achievement tests serve in the schools. 
Unlike general intelligence tests and aptitude tests, they 
are designed not simply to indicate ability to learn a job, 
but to measure the amount the applicant actually knows 
about the job at the time the test is taken. 



CHAPTER XVI 


Constructing a Psychological Test 
for Employment 


B efore discussing specifically any of the tests 
which have been devised for use in industry, we 
shall consider the steps involved in the construction of a 
psychological test for use in this field. This is a very 
important part of a consideration of industrial tests, since 
the diflaculties of carrying out these steps have consti- 
tuted the chief drawbacks to a wider application of psy- 
chological measurements in dealing with problems of 
employee selection and control. Many of the points to 
be outlined have already been mentioned in the discus- 
sions of tests in the other fields. The importance of test 
validity, for example, and the importance of test relia- 
bility have been emphasized many times. These are two 
criteria for good tests which must be met no matter what 
the field of testing is. It is also true that many of the 
other procedures to be discussed in this chapter are just 
as applicable to tests in the field of school achievement, 
or in the field of intelligence testing, or in the field of 
personality testing. Because they are somewhat less 
likely to be understood and somewhat more likely to be 
neglected by the industrialist, we are emphasizing them 
here. 

Since most of the psychological tests of industry have 
been designed primarily for selecting employees or for 

273 



274 Constructing an Employment Test 

indicating abilities which make for success on certain 
jobs, our outline of the steps in constructing an industrial 
test will be presented with such a test in mind. Also, 
since the contribution of psychologists to measurement 
in the selection of employees has most often been in the 
form of pencil-and-paper tests^ similar in their general 
plan and structure to the pencil-and-paper intelligence 
tests so frequently applied in the academic world, our 
outline will fit best such a test. It should, however, be 
kept in mind that the general principles apply whether 
the test be for selection or for control of employees, or 
whether the test be of a pencil-and-paper type or of 
some manual-performance type. 

The steps necessary in the construction of an industrial 
test may be briefly outlined as follows: 

1. An analysis of the duties of the job and the qualifi- 
cations necessary for performing the job. 

2. The selection of types of tests and types of questions 
to be used. 

3. Decision on approximate length of the test. 

4. The construction of the test questions. 

5. The administration of the questions to a trial group, 
and the analysis of the test material on the basis of 
the trial. 

6. The selection and arrangement of final parts and 
questions to be used. 

7. The application of the test to groups for whom the 
test is designed. 

8. The establishment of norms and critical scores. 

These steps as outlined carry the problem all the way 
from the original need for a test to the final establishment 
of critical scores for making use of the test. A step which 
might be added to the above is one involving a survey 



Constructing an Employment Test 275 


and study of already available material related to the 
problem. In any test construction job, this should pre- 
cede the development or study of new material. What 
has already been done on related tests may give valuable 
information on a new problem. We shall now consider 
in somewhat more detail the various steps outhned 
above. 

1. Analysis of the job. It is quite obvious that it 
would be impossible to construct a measuring device for 
selecting people for a job without knowing anything 
about the job. The personnel psychologist, therefore, 
sets out to acquire a knowledge of the duties of the job 
and a knowledge of the qualifications necessary to per- 
form the duties, the latter probably being the more im- 
portant to know. As he approaches his problem, he 
may find that analyses have already been made of the 
job. This is true in a great many of our state and city 
civil-service organizations. Classification boards have in 
many instances undertaken to analyze all of the jobs 
coming under the state or city jurisdiction, and these 
analyses may be summarized for each job studied. Or, 
the psychologist working in a private industry, particu- 
larly if it be a large organization, may find a classifica- 
tion plan with an outline of the duties and qualifications 
for the various jobs involved. 

On the other hand, the psychologist who constructs the 
measuring device may be faced with the problem of mak- 
ing his own job analysis. He undertakes “a process of 
dissecting a job and describing its component elements.” 
The jobs which must be performed and the steps neces- 
sary for the performance of each of the operations are 
described in detail. The qualifications for the job are 
also studied, and such things as the working conditions, 
the incentives, and the morale may be included. The 



276 Constructing an Employment Test 

analysis of qualifications often includes an enumeration 
of the various mental traits which are necessary for the 
job. 

These analyses may be made by the tester in several 
different ways. Probably the best way is by a direct 
study of employees actually on the job; that is, a study 
involving actual observation of working conditions and 
working procedures. In other instances, job analyses 
may be based upon reports of executives and foremen, 
their opinions being gained in interviews with them or 
through questionnaires which they may answer. Ques- 
tionnaire methods of analyzing jobs have also been uti- 
lized with the employees themselves. 

To summarize the chief purposes of the job analysis, 
so far as the constructing of psychological measuring 
devices is concerned, we may say that the analysis (1) 
gives a basis for selecting the types of tests to be used, 
and (2) indicates the general level of ability to be aimed 
at in tests. 

2. Selection of types of tests and questions. The se- 
lection of types of tests may be viewed from two stand- 
points. We may think of the selection of tests in terms 
of qualities which they measure, in which case we may 
consider memory tests, reasoning tests, judgment tests, 
information tests, tests of observation, or tests of speed. 
On the other hand, we may think of types of tests in 
terms of whether they are to consist of true-false state- 
ments, multiple-choice questions, questions to be an- 
swered by single words, columns of items to be matched, 
etc. In most instances, the first step is a selection of 
type of test from the standpoint of quality to be meas- 
ured, this selection being based upon job analyses or 
job specifications. The type of test from the standpoint 
of nature of the questions is usually a subsequent pro- 



Constructing an Employment Test 277 

cedure secondary to the qualities to be measured, the 
types of questions being selected on the basis of their 
suitability for measuring the qualities or traits selected 
for measurement. 

3. Length of test. It is important to make a prelimi- 
nary decision as to the length of the test at an early 
stage in the procedure, since this will determine the num- 
ber of questions to be constructed, the obtaining of trial 
groups, and a great many details of the further study 
of the material. The decision as to length should depend 
primarily upon reliability and validity. The length 
should be sufficient to insure a reliable test of the quali- 
ties to be measured, and yet should not be so long as to 
interfere with the validity through factors of fatigue or 
boredom. In the practical situation, it often happens 
that factors of available time for giving tests, available 
help for scoring tests, and available money for adminis- 
tering the tests, have to be considered. In an ideal 
situation, however, these should be secondary to the fac- 
tors that make a good test. 

4. Construction of questions. This step involves the 
actual making of the question material. The number 
of questions to be constructed is determined by the de- 
sired length of the test; the general nature of the ques- 
tions to be constructed is determined by the qualities to 
be measured and the level of ability of those to be tested. 
The subject-matter content of the questions is deter- 
mined by the specific job for which the test is being 
constructed. Ordinarily, a number of questions some- 
what in excess of the final number desired is constructed. 
This is to allow for elimination of the questions which 
prove to be poor testing material in the preliminary 
trials. So far as the nature and subject matter of the 
questions is concerned, in the initial construction we caci 



278 Constructing an Employment Test 

be guided only by our job analysis and by previous ex- 
perience with questions in the same or a similar field. 

5. Analysis of test material from preliminary trial. 
After the material for a given test has been constructed, 
it should be administered to a trial group composed of 
individuals of known ability on the job. This usually 
involves giving the test to a trial group of individuals 
already on the job. Such a trial gives a basis for study- 
ing the various parts of the test and the various separate 
questions in these parts with a view to determining their 
actual value in differentiating the good and poor or the 
eflScient and inefficient employees. It is customary to 
study the parts of the total examination for (1) validity 
— ^relationship to efficiency of the employees of the trial 
group; (2) interrelationships among the various parts of 
the test; and (3) distribution of scores or grades given 
by the various parts. Ideally, each part of the test shows 
a high relationship between scores made on it and job 
efficiency; shows a good distribution of scores (not all 
bunched within a few points of each other) ; and is not 
so closely related to other parts as to be measuring the 
same thing. The separate questions of the tests are usu- 
ally studied from the standpoint of difficulty and “selec- 
tive value” or question validity. Each question should 
be studied for its difficulty by a tabulation of the number 
of people in the trial group who answer it correctly or 
incorrectly; and each question should be further studied 
for validity or selective value by a tabulation of correct 
and incorrect answers made by a group of good employees 
as contrasted with a group of poor employees. The last 
of these analyses, that for selective value, can usually 
be made by dividing the trial group into halves, thirds, 
or quarters on the basis of their efficiency on the job. 
After such an analysis, questions are retained for the test 



Constructing an Employment Test 279 

which show a difficulty within suitable range for the group 
to be tested, and which show a small proportion of errors 
for the good group and a large proportion for the poor 
group. 

6. Final arrangement of test. On the basis of the 
analyses just discussed, the final test is put together. 
Those parts are selected which show the highest validity. 
Those questions are selected which prove to be of suit- 
able difficulty and suitable selective value. At this 
point, also, attention is given to the sequence of the 
various parts of the total examination, as well as to the 
sequence of individual questions within a part. The se- 
quence of parts of a test is determined largely on the basis 
of time required for answering, difficulty, and appeal to 
the person being tested. The sequence of the individual 
questions within a part is usually a difficulty sequence, 
questions being arranged in order of increasing difficulty. 

7. Final application of test and establishment of criti- 
cal scores. Before a test can be applied in the actual se- 
lection of employees, we should know the significance of 
various scores on the test. In other words, we should be 
able to answer the question, “At what score level should 
a person be accepted for employment or rejected for em- 
ployment?" This involves the application of the test 
in its final form to groups of known ability, and from 
such an application, the experimental determination of 
the points below which employees are unsuccessful. Such 
determinations are ordinarily spoken of, particularly with 
reference to academic tests, as the establishment of 
“norms" on the test. With reference to the industrial 
test, we more frequently speak of it as the establishing 
of critical scores, or “passing" points. 



CHAPTER XVII 


Measurement in the Selection 
of Employees 

I. The Purpose of Measurement in Selection 

M easurement in the selection of employees is 
generally aimed at selecting before employment 
those who will be efficient on the job, who will be relatively 
satisfied and happy on the job, and who will remain with 
the job long enough to make it worth while to train them 
into the work. Stated in another way, measurement in 
the selection of employees has for its main purpose the 
obtaining of workers who will efficiently and economically 
carry out the work of the organization of which they are 
a part. Just as the mechanical engineer wishes to know 
about the qualities of a machine before purchasing it, it 
is logical to expect that human engineers — personnel di- 
rectors, executives, and industrial managers — will want 
to know about the qualities of an applicant before em- 
ploying him, at least about those qualities which will de- 
termine his success. We have already considered at 
another point in this text the qualities in which the 
human engineer is most interested.^ These are, briefly, 
the natural abilities and special capacities requisite for 
performance on the job, the acquired skills and achieve- 
ments necessary to success, and in some jobs particular 


1 See Chapter VII. 


280 



Measurement in Employee Selection 281 

personality traits. The primary purpose of any selection 
procedure is to differentiate between those who do and 
those who do not possess these qualifications. In addi- 
tion to eliminating those who do not meet the minimum 
requirements of the job, it is desirable that selection 
methods grade those who possess the qualifications in 
varying degrees. 

Both experience and experimental analysis of tradi- 
tional procedures employed in the selection of workers 
have shown that in many cases these procedures fail to 
accomplish their purposes. This has made employment 
managers and personnel directors alert to any improve- 
ments in techniques of measuring human qualities which 
will be of advantage in differentiating the good from the 
poor employee. All of the psychological measuring de- 
vices described in this text are of interest in relation to 
problems of selection. Almost all of them have at one 
time or another been tried, at least in an experimental way, 
in relation to the selection of employees, and many of 
them have been actually introduced as a permanent part 
of selection procedures. The lower susceptibility to error 
of tests of ability and achievement has made these tests 
the most used among the various psychological measure- 
ments in the selection of employees. 

II. Problems Related to Measurement 
in the Selection of Employees 

All of the problems which are generally involved in the 
construction and evaluation of psychological measuring 
instruments are, of course, involved in relation to their 
development for use in employee selection. Certain of 
the problems, however, are of particular importance in 
the application of measurement to personnel work and, 
therefore, deserve special mention at this point. 



282 Measurement in Employee Selection 

1. Analysis of the job. As was pointed out in Chapter 
XVI, the first step in fitting men to jobs and in develop- 
ing measurement procedures for selecting the men is the 
making of a comprehensive study of job activities and 
requirements. Such an analysis is basic to the working 
out of any test or any other measuring instrument for pre- 
dicting vocational success. It cannot be doubted that 
some selection programs have failed because they have not 
been founded upon scientific and adequate job analyses. 
The shortcomings of job analyses which are to be used 
as a basis of developing employment tests and measure- 
ments are most likely to rest upon the following: 

(1) The analyses are often too hastily made and too 
superficial in nature. (2) Poor methods may be em- 
ployed in obtaining the information to be used as a basis 
for the analyses. Too often the information is gathered 
from foremen or supervisors who do not have an intimate 
enough contact with the actual job to be able to supply 
all the information necessary. In other instances, defi- 
ciencies may be due to the utilization of poorly con- 
structed questionnaires for obtaining information from 
employees on the job. (3) Persons making job analyses 
may not be qualified to undertake such procedures. Best 
results would seem to demand that analyses be made by 
persons trained in personnel work, persons with an ade- 
quate understanding of human traits and abilities, and 
persons at least somewhat familiar with the job processes 
to be studied. (4) A final shortcoming in job analyses 
is often the .failure to arrive at traits and qualifications 
necessary for success. Too many analyses are limited 
almost exclusively to statements of the duties and the 
working conditions with no indication of the mental ca- 
pacities and other traits essential for success on the job. 



Measurement in Employee Selection 288 

It is to be recognized, of course, that the analysis of the 
capacities and traits essential for success is a difficult 
task, but is much more essential to the development of 
measurements for selecting employees than are mere 
statements of duties and working conditions. 

These various difficulties which arise in the making of 
job analyses are mentioned because of the fundamental 
nature of these analyses in the development of good tests 
and measurements for the selection of new employees. 
If it is not based upon definite knowledge of the job for 
which it is to be used, it is only by chance that a select- 
ing instrument turns out to be useful. 

2. Problems of test construction. The general prob- 
lems of test construction were covered in Chapter XVI ; 
they will not be repeated here. There have been some 
instances of difficulties arising on this score in the de- 
velopment of employment tests because persons un- 
trained in the technique of psychological measurement 
have attempted to construct the tests. In most of these 
instances, the tests have been doomed to failure, and they 
have often had a more serious and far-reaching effect in 
that they have discouraged the industrial organization 
from general extended use of objective test procedures. 
It is encouraging to find that there is today, on the part 
of industrial organizations and personnel agencies in the 
public service, a growing demand for individuals trained 
in the construction and evaluation of objective test pro- 
cedures. 

A second problem in relation to construction of em- 
ployment tests is that of inability to construct instru- 
ments of measurement which will reliably measure all of 
the traits desirable in an employee. The field of person- 
ality measurement, for example, has not yet reached a 



284 Measurement in Employee Selection 

point at which it can generally supply reliable and prac- 
tical instruments for measuring personality traits in pros- 
pective employees. 

3. Establishing criteria for evaluating measurements 
in selection. The evaluation of employment tests and 
other selection methods involves a comparison between 
the test scores and the success of workers on the job. 
There is not a real basis for the introduction of a new 
selection procedure unless it can be demonstrated that 
the procedure actually measures success on the job, and 
the only way to do this is to have available a standard 
of accomplishment on the job. This standard of accom- 
plishment on the job is usually known as the criterion of 
vocational success, so-called because it is the criterion by 
which we judge the efficiency of our test or other measur- 
ing instrument. Many studies of tests have failed or 
have arrived at no definite conclusions because an ade- 
quate criterion of success was not available or was not 
established. To quote from a discussion of this point 
by Bingham, 

many a study of methods of selecting people for jobs 
has led to ambiguous conclusions because of the inade- 
quacy or unreliability of the criterion by which the 
methods were judged. All too often a research has 
passed through the laborious and expensive phases of 
making the job analysis, constructing ingenious tests, 
and giving the tests to numerous employees, before the 
investigator discovered that no adequate and reliable 
measure of individual achievement on the job was to be 
had.* 

Criteria of vocational success are of various types. 
Viteles has divided them into two groups: objective and 

* Bingham, W. V., "Measures of Occupational Success,” Harvard Busir 
ness Review, 5 (1926), pp. 1-10, 



Measurement in Employee Selection 285 

subjective. The former includes factors which can be 
definitely measured and expressed in objective terms. 
They may be quantity of output, quality of output, num- 
ber and cost of accidents or breakage, length of service 
on the job, rate of advancement in the job, compensation 
where it has a definite relationship to production, and 
performances on standard trade or job achievement ex- 
aminations. Objective criteria are to be desired in all 
programs of evaluating selection procedures, but in many 
instances they are impossible of attainment. Too often 
quantity and quality of output are not measurable in 
any simple quantitative terms. We might think, for 
example, of the relatively simple job of the clerical 
worker. His work is practically always of too varied a 
nature to be measured directly in terms of amount done. 
As a rule, only for those jobs in which the individual is 
working on a definite piece-rate basis will it be possible 
to establish an adequate criterion in terms of direct 
measures of quantity and quality of output. The subjec- 
tive criteria are usually criteria based upon supervisors’ 
ratings or estimations of success and efficiency. Too 
often these subjective criteria are so unreliable that com- 
parisons of test procedures with them mean little, if any- 
thing. 

4. Convincing employers and managers of the value 
of measurement in selection. In the introduction of 
psychological measurements into employee selection, psy- 
chologists have had to do a great deal of pioneer work in 
convincing industrial managers, executives, and others 
in charge of employee management. Until recently, very 
few executives and industrial managers have been trained 
in the test techniques which we have been considering, 
and they have been slow to adopt the measuring instru- 
ments which have more frequently been worked out in 



286 Measurement in Employee Selection 

academic circles. As we pointed out in the historical 
consideration of psychological measurement in industry, 
the new measuring devices usually have had to await a 
demonstration that they would be of economic advantage 
to the employer. 

III. Measurement in the Selection of Employees 
Illustrated 

We noticed in our discussion of the use of mental tests 
(Chapter VII) that the public service has generally 
been somewhat in advance of private business and in- 
dustry in the utilization of the newer methods of em- 
ployee measurement. The United States Civil Service 
Commission, through its Research Division and through 
the efforts Of the director of this division (Dr. L. J. 
O’Rourke), has been outstanding in accomplishment in 
this respect. We shall examine some of its work as an 
illustration of scientific test development in selecting 
employees. The background of its work is of sufficient 
interest to warrant our noting it. The following is from 
a report by Herbert A. Filer, at one time Chief Examiner 
for the Commission: 

Notwithstanding the financial handicaps under which 
the Commission has always found itself, it has con- 
stantly endeavored to reach out and make use of im- 
proved methods wherever they could be found. In 
January, 1917, learning of work being done by Dr. 
Edward L. Thorndike of Columbia University and Dr. 
Walter Dill Scott of Northwestern University, in devis- 
ing tests for employment purposes, Mr. George R. 
Wales, then chief examiner of the Commission, wrote 
to these two men asking them for permission to see a 
complete set of the tests they had developed and re- 
questing any comment that they might deem of interest 



Measurement in Employee Selection 287 

in connection with the Commission’s work. Dr. Thorn- 
dike replied that Dr. L. K. Frankel of the Metropolitan 
Life Insurance Company could furnish copies of the 
tests arranged for clerical workers; and upon request 
Dr. Frankel did so, with the understanding that the 
copies of the tests were furnished for confidential sight 
only and were to be returned to him after inspection. 

Dr. Scott replied, enclosing a preliminary announcement 
and two reprints of tests, stating that it would be seen 
that the work was simply experimental, and that it was 
desired that none of the material should be given out 
until it had been thoroughly tested. 

It will be recalled that war with Germany was for- 
mally declared on April 6, 1917, less than three months 
after this correspondence was initiated. Before and 
after the declaration of war, the Commission was liter- 
ally swamped with work incident to recruiting the civil- 
ian forces of the Government; hence no opportunity 
was afforded to accept an invitation extended by Dr. 
Scott to visit him and inspect his work; there was no 
time to follow up the subject in any direction. 

Soon after the armistice the Commission arranged with 
Dr. R. M. Yerkes to give the Army Alpha test to about 
one hundred of its own clerical employees. The results 
of this test were studied and the relationship found be- 
tween the Alpha scores and the individual efficiency 
ratings of the employees, as well as between the en- 
trance examinations of these employees and their effi- 
ciency. While this trial did not show that the Army 
Alpha was an improvement over the Commission’s es- 
tablished examinations for selecting clerical workers, 
it did indicate that tests might be devised according to 
the basic principles of the Alpha and similar tests, which 
would be advantageous for Government employment 
purposes. 

In April, 1919, Dr. John B. Watson, head of the De- 
partment of Psychology of Johns Hopkins Hospital, 
was engaged by the Commission as an expert examiner 
to start experiments with a view to determining whether 
the principles observed by psychologists in testing w'ork 



288 Measurement in Employee Selection 

were applicable to civil-service examinations, and if so, 
to what extent. The experiments were made in the 
Baltimore post office, tests being given to several large 
groups of clerks employed in that office. The results 
indicated that neither the experimental tests nor those 
given by the Commission for entrance, bore a suffi- 
ciently close relation to the efficiency ratinp as fur- 
nished by the post office authorities to establish definitely 
their selective qualities. Either the tests were not ap- 
propriate or the efficiency ratings were incorrect, and it 
seemed probable that both were faulty. 

Dr. Watson then frankly stated that he would not 
have time to devote to continued research for the Com- 
mission, but that he was interested in the “merit 
system” of employment and recommended that a psy- 
chologist engaged in such work be consulted with a view 
to employment by the Commission as consulting ex- 
aminer. He mentioned the name of Dr. Beardsley 
Ruml, then associated with Dr. Walter Dill Scott in the 
Scott company. Accordingly, Dr. Ruml was communi- 
cated with and came to Washington for the purpose of 
making a survey of the Commission’s examination 
methods, with a view to formulating a constructive pro- 
gram for the improvement of those methods. 

Dr. Ruml’s report was dated June 10, 1920. He 
recommended in substance that, in order to make a sav- 
ing in the current expenditures of the Commisrion 
sufficient to provide for the establishment of a small re- 
search unit, a rearrangement be made of some of the 
tests used in the more generally attended examinations, 
such as those for the Postal Service, so as to economize 
in the scoring and other handling of the papers without 
making radical changes in the content of the examina- 
tions themselves. He pointed out that upon the 
establishment of a research unit the efficiency of exam- 
inations in use could be checked against the demon- 
strated proficiency of employees, and that new and 
different methods could be substituted wherever their 
superiority could be proved. This program seemed wise 
to the Commission, and it was adopted. Dr. Ruml was 



Measurement in Employee Selection 289 

engaged as a consulting examiner and is still retained 
in that capacity. 

Following the program adopted on the recommenda- 
tion of Dr. Ruml, the examination for clerks and car- 
riers in the Postal Service was revised after some 
experimentation and research. A small research unit 
headed by Mr. Guy Moffett, an experienced employee 
of the Commission, conducted this experiment. No 
radical change was made in the actual test, but by a 
rearrangement of the form and a revision of the arith- 
metic questions a considerable saving of time was ef- 
fected in the rating of the papers, and the time of giving 
the examination was reduced by one-half. Revisions 
were made in a number of other examinations along 
similar lines. For example, changes were made in the 
arithmetic and geography of the railway postal clerk 
examination, effecting a material saving in the work of 
the examiners. An entirely new examination for book- 
keepers was constructed. Wherever the type of exam- 
ination would permit, the questions were so arranged 
that they could be answered in a single word and 
scored by means of a key or stencil. These methods 
were applied to many examinations from time to time, 
including those for junior scientific positions and for 
auditor of income-tax returns. 

Experiments were undertaken on a new general cleri- 
cal examination to take the place of the examination 
previously given for clerks engaged in miscellaneous 
clerical tasks requiring no previous training or special- 
ized knowledge. Much study and experimentation were 
devoted to this examination before it was finally 
adopted and announced. It has been used for the field 
service in several of the thirteen civil-service districts 
and throughout the eountry for the departmental serv- 
ice in Washington. This test requires only half as 
much time to administer as the examination formerly 
given, and perhaps one-third of the time to correct, and 
it is believed to be a much more accurate measure of 
clerical abilities. 

On July 1, 1922, an appropriation became available 



290 Measurement in Employee Selection 

especially for investigations looking toward the im- 
provement of examinations. Since that time, Dr. L. J. 
O’Rourke, a psychologist who was a member of the Air 
Service Psychological Research Laboratory and of the 
Civilian Advisory Board of War Plans Division, has 
been in charge of the Commission’s Research Section. 

He has done valuable work on the general clerical ex- 
amination .and is conducting a number of other re- 
search studies.® 

Since its establishment the Research Division has con- 
ducted a number of studies looking toward improvement 
in the methods of selecting public employees. These 
have included studies of clerical tests, tests for post office 
workers, stenographer-tsrpist tests, general adaptability 
tests for measuring general mental ability of applicants, 
oral examinations for investigators, and police tests.^ 
We shall examine the postal tests as illustrative of the 
type of measurement useful in selection of employees and 
as indicative of procedures essential to development of 
the instruments of measurement. The need for adequate 
means of selecting postal workers is obvious when we 
consider the numbers of applicants for positions and 
number of employees involved. Of all the departments 
of our federal government, the post office employs the 
largest number of workers. At the time the work was 
being done in the development of new examination 
methods for postal workers, the Civil Service Commission 
was examining between 60,000 and 80,000 applicants a 
year for positions of clerk and city carrier. It was for 


3 Filer, Herbert A., and Q’Rourke, L. J., ‘Trogress in Civil Service 
Tests,” The Journal of Personnel Research, Vol. 1, No. 11, March 1923, 
p. 484. 

* These studies are described in various reports of the Research 
Division contained in Annual Reports of the United States Civil Serv- 
ice Commission. 



Measurement in Employee Selection 291 

this large group of workers that the new methods of se- 
lection were studied and introduced, replacing old meth- 
ods of academic examination. 

On the basis of an extensive analysis of duties per- 
formed by the postal workers, several preliminary tests 
were constructed for trial. Each trial test was studied 
individually for its suitability for final inclusion in the 
new examination. Before any test was included in the 
final examination, it was determined that this particular 
test® — 

1. Measures qualities which are essential for success in 

the work (validity). 

2. Differentiates between degrees of ability. 

(a) Is focused at the correct difficulty. 

(b) Is of correct range of difficulty. 

3. Is readily duplicated. 

4. Is reliable: Is such that a competitor will make 

approximately the same grade on a later as on the 

original attempt. 

5. Is objective. 

(а) In administration. 

(б) In scoring. 

6. Is practicable. 

(a) In time required and facility in administration. 

(b) In time required and facility in scoring. 

(c) In cost of printing. 

The validity of the tests. The degree to which each 
test indicates the relative ability of those tested was de- 
termined by study of the relationship between the test 
scores and a criterion of efficiency established for trial 
groups of distributors already working in post offices. A 

5 O'Rourke, L. J., ^'Report of the Director of Research," in Forty 
second Annual Report oj the United States Civil Service Commission, 
1925, p. xliv. 



202 Measurement in Employee Selection 

combined criterion was used including (o) average num- 
ber of pounds of first-class mail distributed by each em- 
ployee during a period of six months, together with 
amount of time in minutes spent on this distribution; 
(6) records in a monthly case examination measuring 
the rate and accuracy with which each employee distrib- 
uted mail into his distribution case; and (c) a subjective 
criterion based upon foremen’s ratings of eflBciency. 
The first test selected was the one having the highest 
relationship to the efficiency criterion. The next test 
selected was the one showing also a high relationship 
with the criterion and, in addition, measuring important 
qualifications not measured by the first. The third test 
selected was the one best measuring important qualifica- 
tions not measured by the first two. The three tests 
finally selected included a general test measuring general 
ability requisite for the job and two special tests meas- 
uring more specific requisites for postal work. The na- 
ture of these is indicated by items reproduced from the 
“Sample Questions” furnished applicants (see page 296). 

A final study to verify the value of the new examina- 
tion, made on a group of mail distributors in the Chicago 
city post office, showed the relation between test scores 
and efficiency of the employees indicated in Fig. 20.® 
The vertical line of the figure 0-0 is the dividing line 
on either side of which 50 per cent of the efficiency rat- 
ings of the whole group fall. Each horizontal bar repre- 
sents one-quarter of the group tested. The top bar (A) 
represents the distributors who made the highest 25 
per cent of the test scores; the lowest bar (D) those who 
made the lowest 25 per cent of the test scores. In each 
bar, the part to the right of the vertical line 0-0 in- 


xlvii. 



Measurement in Employee Selection 293 

dicates the percentage of that group above average in 
efficiency; the part to the left of 0-0, the percentage 
of that group below average in efficiency. From this 
chart it may be judged that if a competitor makes a 
test score as high as the highest quarter tested in the 
trial group, the chances are 93 to 7 or 13 to 1 that he 



Fig. 20. — Relation Between Test Scores and Efficiency of Mail 

Distributors. 


will be in efficiency above the average of those tested. 
On the other hand if he makes a score in the lowest 
quarter, the chances are 100 per cent that he will be be- 
low average in efficiency. Since it is probable that not 
more than 26 per cent of competitors in postal examina- 
tions will receive appointments (according to a state- 
ment in the Civil Service report), it can be expected 
that the new examination will greatly increase efficiency 
of employees. Fig. 2r shows this increase graphically. . 










294 Measurement in Employee Selection 

The reliability of the tests. The new postal tests rep- 
resent a marked improvement in reliability over the more 
subjectively graded tests which they replaced. The ob- 
jectivity of the test forms insures equality of results of 
scoring by different examiners. O’Rourke emphasizes 
in the introduction of these new tests the importance of 
sample questions, supplied the competitor before he takes 


93% of New Clerks Above Average Efficiency of Pinesent Employees 
A 


Emplouea^ from IS 
new eiuimination |||g 


Employees ■Kg 



Division of Present and New Employees into Oroups on Basis of Efficiency of Phesent Employees 


I Below 
evereye 


1 I High 


Fig. 21. — Improved Selection Made by New Postal Examination. 


the examination, as an aid to reliability. Such a pro- 
cedure reduces the unequal advantages which may result 
from the effect of practice and familiarity with test forms 
on the part of some who may have taken the examination 
before. 

The difficulty of the tests. Scores made on the tests 
by the employees constituting the trial groups were 
studied to insure that the final examination be properly 
focused in difficulty. In the development of the general 
tests, each individual item or question was studied. Dif- 
ficulty values were determined on the trial groups, and 




Measurement in Employee Selection 295 

items were selected so as to give a wide range of total 
scores. Items were included which were answered cor- 
rectly by from 9 to 85 per cent of the trial group. For 
each series of the examination 100 items were selected. 
These were scaled in difficulty from the easiest to the 
hardest and were so selected that the difficulty increased 
throughout the test in equal steps, as on a scale. 

Practicability of the tests. In the study of the newly 
devised postal tests considerable attention was paid to 
the problems of construction of future tests, administra- 
tion of the tests, scoring of the tests, and cost of the new 
method. It might be conceivable that even increased 
validity of the new test would not justify its introduction 
at greatly increased time and expense for construction 
and administration. A statement from the original re- 
port on the tests indicates their advantage from the 
standpoint of these administrative problems. 

The new clerk-carrier examination can be scored 
and handled in 40 to 50 per cent less time than the ex- 
amination which it replaced. In an examination in 
which 60,000 to 80,000 compete, this means a saving 
of time required to score from 32,000 to 40,000 exam- 
inations in the old way. This will make it possible 
to make up the eligibility lists and certify eligibles in 
approximately half the time previously required to cor- 
rect the examinations.* 

These tests for selecting an important group of our 
federal employees have been discussed because they pre- 
sent the various problems which are met in the construc- 
tion and evaluation of employment tests, and because 
adequate methods of meeting most of the problems were 
devised. 


* Ibid., p. Ixxi. 



206 Measurement in Employee Selection 


SAMPLE QUESTIONS-POST-OFFICE TESTS 
General Tests 

Write the number of the best answer on the line at the 

right. 

1. The business of mail-order firms has been 
greatly increased by the introduction of 
(1) special delivery (2) parcel post (3) 
postal savings (4) airplane mail (6) lock 

boxes 

2. Letters are delivered promptly by the post 

office so that the (1) office can be closed 
on time (2) inclosures will not be lost (3) 
mail will not be heavy (4) letters will not 
be damaged (5) public may not be incon- 
venienced 

3. A fundamental point is one that is (1) 

final (2) drastic (3) emphasized (4) es- 
sential (5) difficult 

4. The saying, ^To do, one must be doing,^' 
means most nearly (1) What you do, do 
thoroughly. (2) More is needed than good 
intentions. (3) Think before you act. 

(4) By our deeds we are known. (5) 

Well begun is half done 

In the sentence below, the word printed in 
italics has been misspelled. It is spelled ac- 
cording to its sound. Write the correct spell- 
ing of this word on the line at the right. 

5. The plan was sankshunned by the com- 
mittee 

6. Which one of these five may be applied to 

both books and magazines, but not to 
postman! (1) expected (2) reliable (3) 
accurate (4) authorized (6) published . . . 



Measurement in Employee Selection 297 

In the following question, the first two words 
in capital letters go together in some way. 

Find how they are related. Then write a 
number to show which of the last five words 
goes with the third word in capital letters in 
the same way that the second word in capital 
letters goes with the first. 

7. SACK is to MAIL as PURSE is to (1) money 

(2) suitcase (3) bag (4) owner (5) luxury 

8. Over what body does the Vice President 
preside? (1) Senate (2) House of Repre- 
sentatives (3) Interior Department (4) 

Supreme Court (5) Cabinet 

Answer the question from the quotation 
which follows it. 

9. What one word in the quotation describes 

the person who is not interested in public 
welfare? 

^Tn almost every community there are cer- 
tain men and women who are known as pub- 
lic-spirited. Others may be selfish and act 
only as their private interests seem to require.” 

10. If 4 men can distribute 700 letters in 2 
hours, in how many hours would they dis- 
tribute 1,750 letters, at the same rate?. . . . 

Sorting 

In the SORTING scheme below, each square repre- 
sents a box for mail going to the cities named in that 
square. You will be required to study the sorting 
SCHEME and then write after each city in the follow- 
ing list the number of the box in which you would put 
mail for that place. Look at the first name in the list, 
^^Harbur.” The number “2” is written after it because 
Harbur is in the box numbered 2. “Leadwood” is in 
box number 8, so ''8” should always be written after 
Leadwood. 



298 Measurement in Employee Selection 

Work straight down each column, taking the cities in 
order. You will receive no credit if you skip cities and 
scatter your answers. 

Study the sorting scheme for 10 minutes, to get it 
thoroughly in mind before beginning to write. 

Sorting Scheme 


Red Bank 


Denver 


Texan 

1 Painter 


3 Rayburn 


5 Mesa 

Carter 


Sunset 


Grande 




Harbur 


Eastlake 


Randall 

2 Refuge 


4 Boston 


6 Lowell 

Concord 


Lakeview 


Porter 





Wheeler 


Camden 


9 Forest 


10 Roswell 


Sumter 


Chester 


Edison 
7 Milbrook 
Appleton 


Lcadwood 
8 Fox 
Morton 


You may look back at the sorting scheme as often 
as you wish. 

You may not have time to finish the test. Do as 
much as you can in the time allowed. 


City 

Box No. 

City 

Box No, 

Harbur 

t 

Red Bank 


Leadwood 


Lowell 


Fox 


Carter 


Edison 


Denver 


Porter 


Sunset 


Eastlake 


Edison 


Grande 


Morton 


Painter 



Porter 


Milbrook 


Denver 


Boston 



Lakeview 


Camden 


Mesa 


Milbrook 



Appleton 


Grande 



Forest 


Randall 


Chester 


Wheeler 


Texan 

I 















Measurement in 

Employee Selection 

299 

City 

Box No, 

City 

Box No. 

Refuge 



Concord 


Boston 


Rayburn 


Painter 


Lakeview 


Roswell 


Morton 


Sumter 


Eastlake 


Refuge 


Appleton 


Fox 


Texan 


Lowell 


Boston 


Concord 


Sumter 


Randall 


Painter 


Roswell 


Carter 


Red Bank 


Wheeler 


Mesa 


Edison 


Refuge 


Rayburn 


Fox 


Leadwood 



Following Instructions 

This is a test of your ability to follow instructions. 
All directions must be followed exactly as shown in this 
sample test. 

Below, at the left, is a list of post offices, called a 
SORTING SCHEME. After each of these offices is a letter. 
For example, after ‘^Bowers^^ is the letter ^^A”. This re- 
fers to the *^A^’ in the key at the right, which reads 
^^A Felton 4.” The ^^A” after ^^Bowers” means that 
mail for Bowers is routed by way of Felton. 

The numbers after the names in the key indicate the 
trains on which mail for those post offices must be 
placed. After “Felton” in the key you will find the 
number 4. This means that mail for Felton is sent on 
Train 4. Since mail for Bowers is routed by way of 
Felton, mail for Bowers, also, would be sent on Train 
4. 


SORTINQ SCHEME 

Allen ....C 

Bowers , . .A 

Camden . .C 

Daly . ... I 

Denham ..E 

Dover C 

Felton A 


KEY 

Mail sent by way of- 
A Felton 4 . . . 
B Union . . .8 — 
C Camden .6 — 

D Woods 

E Allen 

H Turner . . .9 — 



800 Measurement in Employee Selection 


SORTING SCHEME 

Malter . . D 

Turner . . . H — 

Viola .... B 

Woods ... A ... 
Union B 


KST 

I Dover 


YOU MUST FOLLOW DIRECTIONS EXACTLY AS GIVEN. 
Make your numbers and letters clear, to avoid mistakes. 

Look at the name ^^Woods^^ in the key. It is not 
followed by a number. Write after it the letter which 
you find after '^Woods” in the sorting scheme. Your 
key will now read Woods A” Find the letters after 
Allen and Dover in the sorting scheme and write them 
after those names in the key. 

Never put numbers in the sorting scheme. 

On the line after each of the following offices, write 
the number of the train on which you would send mail 
for that office. 

To find the number which should be written after 
Viola, look for Viola in the sorting scheme. After it 
is the letter B. This refers to key B Union 8, and 
means that mail for Viola is routed through Union on 
Train 8. 

After Denham is the letter E. This refers to key E 
Allen Cj and means that mail for Denham is routed 
through Allen by way of C, and key C reads Camden, 
on Train “6.” Write '^6^^ after ‘^Denham” in the list 
below. Now write the train numbers after the others. 

Viola ... 8 Bowers Turner 

Denham Daly Malter 

You now receive Bulletin No. 1 : 

Changes in Routing 

Never change the letter before the name in the key. 
When a letter or number is changed, it is always the 
letter or number after the name. 



Measurement in Employee Selection 301 

(Make changes in both sorting scheme and key if 
the names are in both.) 

Woods by way of C 
Dover by way of B 

To make the change for Woods, cross out the “A” 
after Woods in the sorting scheme and write “C.” 
Then your sorting scheme for Woods should read: 
“Woods d: C” This means that mail for Woods is now 
sent by way of “C Camden 6.” Next look for “Woods"' 
in the key, and change the '"A" after it to “C." Make 
the change for Dover so that your sorting scheme will 
read: “Dover (p B/* and the key will read: “I Dover 

(pBr 

After making the above changes, write the number 
of the train on which you would send mail for each of 
the following offices: 

Woods Camden Dover . 

Felton Union . Allen 

Next you receive Bulletin No. 2: 

Changes in Routing 

(Make changes in both sorting scheme and key if 
the names are in both.) 

Change key C to read: C Camden 2 

Change key A to read: A Train 5 
Felton by way of B 

Change key E to read: E Allen 7 
Allen by way of E 

To make the change for key C, cross out the 6 after 
Camden in the key and write because the train for 
Camden has been changed from “6"" to “2.” 

To change key A, cross out ‘Telton 4,” and write 
“Train 5.'' This means that mail for offices marked “A" 
is no longer sent through Felton but is routed direct on 
Train 6. 



302 Measurement in Employee Selection 

Write the number of the train on which you would 
send mail for: 

Bowers Felton Daly 

Dover Denham Allen 



CHAPTER XVIII 


Psychological Measurement in the 
Control of Employees 

O NE of our most outstanding industrial psycholo- 
gists, urging engineers and industrial managers to 
approach their problems in a mood of exploration, re- 
cently stated, “But we must not leave behind our con- 
ceptions of science, for without them we cannot go far. 
One concept indispensable for our purpose is that of 
measurement. Another is correlation. These are basic 
to all others, for scientific research is the precise deter- 
mination of relationships between variables.” 

The variables which the industrial manager should be 
interested in measuring, and which the psychologist can 
help him to measure, include the aspects of a person's 
behavior which are significant in his work and the factors 
or conditions which determine the worker’s output and 
produce the satisfactions and happiness which he should 
have. 

Bingham has outlined these aspects of behavior and 
factors influencing behavior according to the accompany- 
ing table: 

1. The tempo at which he chooses to work. 

2. The mechanically imposed tempo to which the work- 
man can adjust himself ; or the number of machine 
parts (e.g., spindles) or machines (e.g., looms) which 
he is able to operate. 

3. Avoidance of errors, spoiled work, and accidents. 

303 



804 Measurement in Employee Control 

4. Speed and efficiency in eliminating disturbances and 
irregularities in machine or materials. 

5. Quality of product, in the narrower sense. 

Let us stop for a moment to consider which of the in- 
struments of measurement in possession of the psycholo- 
gist may help to solve some of these problems. A peek 
into a modern psychological laboratory shows us tachis- 
toscopes, ergographs, chronoscopes, instruments for meas- 
uring reaction time, sphygmomanometers, and metabo- 
lism machines. 

Perhaps the worker is too slow in the performance of 
his job. A micro motion camera and time and motion 
studies show that there is too much wasted motion in his 
method of doing the task. His efficiency is measured in 
terms of reductions of waste motions. Perhaps another 
worker always feels tired or worn out at the end of his 
day’s work. A metabolism machine will measure his lack 
of energy production and will tell whether he needs a 
bit of thyroid to speed up. Still another worker may 
get irritated at only slight annoyances or “blows up” 
when something goes wrong. A blood-pressure instru- 
ment may reveal that this worker has a definite physical 
disturbance which is the basis for his apparently neurotic 
condition. Another worker may be particularly prone 
to accidents in the operation of his machine or may find 
that he is unable to coordinate his own movements with 
the speed of the machine. An instrument for measur- 
ing his reaction time may reveal that his movements are 
naturally too slow for the type of work he is trying to 
perform. 

In addition to these mechanical precision instruments, 
the psychological laboratory utilizes tests of various 
general and special mental traits to measure fitness of 
the worker for the job; rating scales which make as ob- 



Measurement in Employee Control 805 

jective as possible supervisory ratings of personal effi- 
ciency; learning curves which show rates of progress to 
be expected in teaching employees; and diurnal curves of 
efficiency which show variations in output to be expected 
for a day of the given length. Finally psychology is in- 
terested in environmental conditions in which the various 
factors have been measured, many of the things in which 
the industrial manager is interested. Such may be men- 
tioned as studies of distractions in which not only the 
effect on output has been measured, but also the cost in 
energy consumption or effort; and studies of the effect 
on behavior of external environmental factors such as 
light, weather, and season, of best routes or organization 
for tasks, and of habit influences when several tasks are 
performed. 

There are, of course, many problems for which psy- 
chological measurement has no answer. There are 
needed in many of these instances more precise units, 
more accurate scales with which to measure human traits 
and with which to detect changes in various aspects of 
human reactions, but more use could be made of the 
mental and physical yardsticks at hand. Even the 
crudest of them are better than trusting to personal 
impressions, ordinary common sense, or intuitive judg- 
ments. 

To show some of the contributions of psychological 
measurement in industrial management we shall discuss 
three examples: first, an example in accident prevention 
in a public utilities company; second, an example of 
improvement in efficiency through time and motion 
study ; and third, an example of the use of psychological 
measurement in management in a large department 
store. These three examples have been selected because 
they show variation in the types of measuring devices 



806 Measurement in Employee Control 

used, and variation in the types of problem which may 
be met. 


I. Example of Measurement in Relation to 
Accident Prevention 

The organization which we shall use as an example 
here is the Boston Elevated Railway. The study dis- 
cussed was one conducted under the direction of Bing- 
ham and begun in 1926. At the time the study was 
started, the problem that confronted the management 
was the high cost of accidents. The railway safety record 
was fairly high, but a million dollars a year was being 
expended for personal injuries, property damage, and 
claims. No radical improvement at that time was to 
be expected from better methods of hiring new em- 
ployees, because there was almost no labor turnover. 
The problem lay with the experienced operators, in fact 
with a relatively small percentage of these, for it was 
found on study that about 20 per cent of the operators 
caused practically all of the accidents. In other words, 
there were about 20 per cent who might be termed 
“accident-prone.” 

Bingham began his study by first making a job analysis 
and by collecting all the useful data which might already 
be at hand. When he began work on the problem he 
found already available the individual records of acci- 
dents in full detail, personal history information which 
the employees had supplied on their application blanks, 
medical examinations of the employees made at the time 
of hiring, records of earnings, overtime, absences, errors 
in making out cards, errors in turning in cash receipts, 
disregard of rules, and a long list of other things which 
one might expect to have available in any electric rail- 
way company. Bingham set about the problem with the 



Measurement in Employee Control 807 


object of finding whether any of these data were related 
to accidents; in other words, could any of these data 
which were available serve as a human yardstick so far 
as proneness to accidents was concerned. ■ 

He did find, among the data which he had available, 
some which were apparently related to the tendency to 
accidents. Among other things he found that each man 
had a “coasting record.” These records represented the 
time during each run when neither brakes nor power 
were applied to the car. The motormen were instructed 
to coast when possible to conserve wear and tear on the 
equipment and to economize on power. Each man’s 
record was computed and posted periodically. Bingham 
suspected that these records might reveal those who were 
most competent and conscientious and should conse- 
quently have fewer accidents. This proved to be true. 
The relationship between these efficiency records and ac- 
cidents was not close, but it was real. 

One more example of an objective measure which 
Bingham found to have a positive relationship to acci- 
dent susceptibility was blood pressure. Poor health as 
shown by the blood-pressure records was evidently a 
causal factor in accidents. Bingham’s method was to go 
through the list of measurements which he had available, 
to correlate each with accident occurrence, and to find 
which ones had any relationship — low or high — to 
accidents. 

After a study of this nature by which he was able to 
discover some of the important factors to be measured, 
Bingham set about actual work with individual motor- 
men. The men selected for first attention were those 
whose previous records showed them to have had the 
largest number of accidents. In each instance the details 
of the accidents were fully studied for clues as to prob- 



808 Measurement in Employee Control 

able causes. In each of these cases one of the inspectors 
actually observed the man as he performed his job. The 
operator’s whole record was studied for indications of any 
divergences from the usual attitude toward his job. 
Finally, the man involved in an accident was interviewed. 
If necessary a new medical examination was made. 
With all these facts and all these measurements at hand, 
a diagnosis was made, an individual one for each case, 
and a definite plan of cure prepared. In most of the 
cases the problem was solved in such a way that these 
men could be kept on the same kind of work. A ma- 
jority of the causal factors of the accidents occurring in 
this company were factors which could be eliminated 
provided the accident-prone employees were sufficiently 
studied by the definite means of measurement at hand. 
We may summarize Bingham’s work on this particular 
problem by quoting two paragraphs from his report: ^ 

It so happened that a number of these accident-prone 
men were at the time operating on the Harvard-Dudley 
line on Massachusetts Avenue. This is one of the most 
difficult routes in the Boston area. It crosses many 
main thoroughfares. The cars are huge and fast. Up 
to November, 1926, no month had gone by with fewer 
than 48 accidents on this route. But in December, the 
next month after this work with the accident-prone men 
began, there were 23. As a consequence, the following 
year the road as a whole achieved a 17% reduction in 
collision accidents. A similar further reduction was 
made in 1928; for this achievement, the Railway was 
awarded the Anthony N. Brady Memorial Gold Medal 
in the national competition among street railways for 
the best record in accident reduction. 

Of all the kinds of factors that proved to have sig- 
nificant relationship to a motorman’s actual perform- 

^Bingham, W. V., “The Science of Work,” Technology Review, July 
1932, p. 374. 



Measurement in Employee Control 809 

ance in avoiding accidents, those which showed the 
closest relationship were matters of physique, health, 
and eye-sight. That was to be expected. Almost 
equally close were the factors of mental attitude toward 
the responsibilities of the job. Next in importance were 
habits of operation and items of knowledge about the 
jol>— specifics which can be taught. Then came the 
factors which may be grouped under the heading of 
aptitudes for this kind of work. In other words, we 
found that physique and health, right attitude, knowl- 
edge and skill, and natural aptitude for work of this 
character, are all factors in the equation. 

II. Time and Motion Study 

For most industrial operations there are several ways 
of performing the job. There are varying numbers of 
motions that may be made, and there are various orders 
in which given motions may be made. We cannot be 
sure without study whether all the motions a worker 
makes are necessary or whether the order in which he 
makes them is the best order. The worker himself can- 
not tell whether he is doing the job in the best way, and 
even an expert watching him cannot be sure of his judg- 
ment. Furthermore, the speed at which motions are 
ordinarily made is such that it is difficult to observe 
them and to get an accurate idea of their order. For 
these reasons it is desirable to have a method of meas- 
uring these motions — a permanent record — one which 
can be studied at leisure. 

A technique which has been widely used for this is the 
continuous photograph method. A moving film is used 
to record the motions as the worker performs his task. 
Since in most instances only the motion of a small part 
of the body (such as the motion of the hand) is of inter- 
est at one time, a small battery lamp is usually mounted 
on the part to be studied. The camera with the shutter 



310 Measurement in Employee Control 

open is then placed in front of the person. The figure 
of the individual may be blurred, but the lamp gives a 
continuous streak of light on the film which constitutes 
a permanent record of the motion of the hand or other 
part of the body. To indicate the time duration of each 
motion, a time record is made on the film. In order to 
study the third dimension or depth of the motions, pic- 
tures are usually taken from more than one angle. 
Motion models in wire are sometimes constructed from 
the various photographs of a motion. 

Motion Study in Bricklaying. The classic example 
of increased efficiency produced by motion study is 
Gilbreth’s work with bricklayers. For hundreds of 
years bricklayers have been lowering and raising the 
upper portion of the body with every brick they laid. 

It developed as a result of careful investigation that the 
bricklayer was using something like eighteen different 
motions, many of which were unnecessary. For in- 
stance, he picked up a brick with his left hand and 
turned it around until he found the best face of it to 
place toward the surface of the wall. Then he reached 
for the mortar with his right hand and, after spreading 
the mortar, put the brick in place on the wall. The 
workers were trained to reach for the brick and mortar 
simultaneously. There was no particular point in leav- 
ing the right hand idle while reaching for the brick with 
the left. These two operations could just as well be 
done at- the same time, because neither one required 
accurate coordination and they could perfectly well be 
done automatically. Then, to eliminate the inspection 
of each brick to determine its best face, it was arranged 
to have this done by an imskilled worker who arranged 
the bricks in a “packet” with the best face always in . 
the same direction. The bricklayer then did not have 
to make any discrimination on that point. The bricks 
were supplied, furthermore, at waist height, so that it 
was not necessary to stoop every time. The men were 



Measurement in Employee Control 311 

taught not to waste time picking up any mortar that 
they happened to drop. By this and other devices, the 
initial eighteen motions were reduced to about five, and 
the average worker, instead of laying 120 bricks per 
hour, was enabled to lay 350. This is typical of one 
of the earlier motion studies with rather striking re- 
sults.* 

A great many other industrial operations have been 
studied by the motion study method since its introduc- 
tion. Such study has in every instance led to improve- 
ment in working efficiency. Mere mention of some of 
the operations studied will indicate the extensive applica- 
bility of the method. There have been studies of the 
motions involved in folding handkerchiefs, dipping 
chocolate-coated candies, packing candy, metal polishing, 
typewriting, assembling carburetors, packing potatoes, 
coal mining, and housework in the kitchen. 

III. Measurement in Department Store Management 

As an example of the use of psychological measure- 
ments in industrial management in the field of depart- 
ment store work, we quote from the work of V. V. Ander- 
son at Macy’s department store in New York City. The 
reader will note the many yardsticks for human abilities 
employed by Dr. Anderson in his study of the “problem 
cases.” * 

About 20 per cent of the employees of mercantile es- 
tablishments — and this is probably true of other busi- 
ness and industrial organizations — are what may be 
called “problem” individuals. That is, as personnel 
material, they are either liabilities or potential liabilities 
to the business man. It is from this group that are 

* Burtt, Harold Ernest, Psychologv and Industrial Efficiency, Apple- 
ton-Century Co., New York, 1929, pp. 96-6. 

» Anderson, V. V., Psychiatry in Industry, Harper & Bros., New York, 
1929, pp. 8, 10, 12, 18, and 26. 



812 Measurement in Employee Control 

drawn the repeated transfers from job to job, or resig- 
nations, or lay-offs. These are the work failures that 
in the majority of cases are a drag on any organization. 

As the employer sees these individuals they are pro- 
duction problems, or chronic health problems, or chronic 
attendance problems, or serious attitude problems, or 
marked disciplinary cases, and the like, and from his 
viewpoint he is better off without them. 

As the psychologist sees them, they all present dis- 
tinguishing physical and mental characteristics that 
Underlie and explain not only their job maladjust- 
ments, but faulty adjustive efforts and failures in other 
life situations. In studying the life histories of these 
individuals, and analyzing their careers, he is impressed 
with three outstanding causative factors commonly un- 
derlying work failure — (a) a maladjusted personality; 
(6) specialized job disabilities; (c) faulty physical 
conditions. 

Of course, such cases can be dumped wholesale into 
the community to shift for themselves. This might, 
after all, be a simple and easy way of solving the entire 
problem, for it is common enough for the Management 
to say, “We are not running a charitable institution; 
why should we feel any responsibility in this matter?” 

But there is another viewpoint — ^that many of these 
individuals may, through proper study and treatment, 
be adjusted and become assets to the employer, thus 
cutting down turnover among employees and increas- 
ing production efficiency. A few case summaries pre- 
sented .in this study demonstrate what can be 
accomplished in this direction, and as our statistics 
show, a sufficiently large number of problem cases will 
improve under treatment to make the application of 
such measures profitable — not only in terms of human 
salvage — ^but in terms of dollars and cents. 

The methods employed in conducting a complete 
study can, for practical purposes, be roughly classified 
into the social history, the job behavior study, and the 
physical and mental examination. These four fields 
go together to make up a thorough-going picture of the 



Measurement in Employee Control 818 

whole individual — his behavior toward his work and to- 
ward major life situations, as well as those influences — 
constitutional, or home, or school, or work — ^that have 
contributed most to his career. . . . 

The psychological method involves the objective and 
quantitative evaluation of certain measurable mental 
qualities. It is of enormous aid to the diagnostician, in 
that it furnishes him accurate and concrete evidence of 
the capacities of the individual, and enables him to 
compare the performances of individuals with each 
other, and with standaids obtained from careful studies 
of large groups. 

Illustrative Case. Female, aged 24 years. 

Problem: Referred by department head with the 
following statement: “She should be tested, as she is 
dull and stupid. 1 think she is poor material, and I 
have already made out a lay-off slip and sent it through 
to the general manager’s office, but I should be glad to 
have you study her.” 

Physical Findings; Height, 5 feet 3 inches. Weight 
117 pounds. Physical examination disclosed some in- 
flammation of the gums. She is in need of dental work. 
Her vision is defective and she should wear glasses. 
She has lost twenty-four pounds in the last four months 
and at the present time complains of loss of appetite, 
fatigue, and “needing a rest.” She has had a good 
many consultations with the hospital physicians re- 
cently for various minor complaints — “headaches, indi- 
gestion, insomnia, nervousness, run-down.” Physical 
examination does not disclose any positive findings. 

Mental Findings — Psychological study: Individual 
secures an intelligence quotient of 88, which gives her a 
classification of dull average intelligence. She is some- 
what slow in performing her psychological tests, but is 
accurate and painstaking. Her attitude was particu- 
larly good toward the examination. 

Personality study: She has a fair sales appearance. 

Is pleasant, agreeable, seems most anxious to please, is 
neat, fairly well dressed, but is not attractive or good- 



814 Measurement in Employee Control 

looking. She wants to succeed at her work but is over 
anxious and fearful. She is naive and too shy and re- 
tiring to be convincing. She has suffered much from 
insomnia, and periods of acute depression — ^worrying 
over her home troubles ; is lonesome and homesick, yet 
realizing that she would not be happy to return home 
under present home conditions. She is too shut-in to 
make friends easily. So since coming to this country 
from Great Britain she has established no friendships. 
As a consequence, she feels out of place, disheartened, 
and cries a great deal. Of late, there has developed a 
great fear of losing her job, and inasmuch as she is 
totally dependent upon her weekly salary, she docs not 
know what will become of her in case she does lose it. 
She does not seem to know how to make headway, and 
appears to be bewildered by the size and complexity 
of things in a great store — having had no experience in 
the past that would teach her how to meet such prob- 
lems. 

She is responsive, polite, and courteous, but obviously 
has a great deal of pessimistic reverie, with a marked 
lack of self-confidence. She becomes greatly frightened 
when admonished or criticized by her superiors, and 
has frequently to leave the floor to go the girls’ room. 

Job behavior: Her contact with customers is good, 
though not particularly effective. They like her be- 
cause she is anxious to please. However, on the floor 
and at her work she seems detached a great deal of the 
time, day dreaming, mentally preoccupied, and slow, 
but when awakened out of her reverie she is responsive 
and obliging. She seems timid in her relations with the 
other girls, and does not appear to know how to make 
herself easy and natural with them. Her reaction to 
her superiors is more that of a frightened child. Her 
knowledge of stock is inadequate. 

Social Findings: Her mother is dead and her father 
married the second time. The stepmother has been 
very unkind to her. Her only brother was killed in the 
war, and her two sisters are married. She left home 
because of imhappy home relations, and determined to 



Measurement in Employee Control 815 

make her way in this country. Her education was of 
grammar-school character, but she has had several 
years^ experience in selling in a store in a small town. 

Conclitsions and Remarks: We do not recommend 
lay-off in this case, but do advise adjustment work. 

We have here a young woman of good enough intelli- 
gence and fair enough sales appearance, who — provid- 
ing we can secure an adjustment of her personality 
disorders — ^will make an acceptable sales clerk, inas- 
much as she has a very responsible and dependable na- 
ture, is courteous and anxious to serve, and will give her 
best to the department. She is in need of a physical 
and mental hygiene regime. Her weight needs to be 
built up, her insomnia overcome, and her depression 
cleared. It is believed that if this can be done her de- 
tached attitude and obsessive reveries which seem to be 
the big factor in handicapping her sales ability will dis- 
appear. 

Follow-up: After the report was made, the buyer 
took a different attitude toward this employee. She be- 
came more friendly and understanding in her relations 
with her. She talked with her and sought to be help- 
ful. Realizing that much of the problem was shyness 
and timidity, based upon a strong inferiority back- 
ground, she made a different approach to her and sought 
to aid and guide her rather than to intimidate and 
frighten her when she made errors. She kept in close 
touch with the psychiatrist, who sought to give her an 
insight into her personal problems, and who worked 
out with her a more satisfactory w^ay of meeting them. 
Her insomnia disappeared altogether within the first 
week. She regained her appetite, began to increase in 
weight and improve in her attitude toward herself and 
the difficulties in her job. Self-confidence increased. 
People seemed to be interested in her. The world took 
on a brighter outlook. Her sales record improved. She 
got acquainted with some of the girls — ^went out to 
luncheon with them. Began to drop into the rest rooms 
at the lunch hour, and took some part in the recreation 
of the other employees. She kept a diary, and we saw 



816 Measurement in Employee Control 

this note: “I am doing the best I can. My weekly 
quota is $250. My books show that I have made $289 
this week. I am much better now and feel a little 
happy at times. I know what it is to be himgry, and 
sleep good too.” 

The psychiatric worker's report from the buyer, two 
weeks after she was referred to us for study, is as fol- 
lows: “She does anything you ask her to do. Is al- 
ways cheerful about it. I would like to keep her. She 
is slow at getting things, but once she has them, then 
they’re there.” Later she states: “Very favorable. 
Special emphasis on willingness and good nature with 
customers.” 

She has been closely followed up, and the last report 
from her department is that she is doing well and they 
do not want to lose her. She became, in the long run, 
a well-adapted and satisfactory employee. . . . 

Illustrative Case. Female, aged 19 years. Cashier. 

Problem: Referred by superintendent of depart- 
ment, on account of the great number of errors and 
shorts which she made, with the statement that if she 
did not improve soon, she would be laid off. 

Physical Findings: Height, 5 feet 8 inches. Weight, 
110 pounds. Considerably undernourished. Has had 
several attacks of tonsilitis and recently has had a 
growth on plantar surface of right foot, which has been 
removed. On recent examination in our hospital, she 
has been rated fair in physical condition except for 
underweight. 

Mental Findings: She has an intelligence quotient 
of 88, rating her as dull average in intelligence. She 
was found to be slow in speed tests, fair in learning 
ability, poor in motor dexterity, and showed a tendency 
to error-making in accuracy tests. She has a pleasant 
and agreeable personality make-up, though a little pe- 
culiar and not very accessible to a personal interview, 
but does not concentrate well on the things at hand, 
showing a definite tendency to mental reverie. She has 



Measurement in Employee Control 817 

a good general appearance, is rather attractive, neat, 
and well dressed. There is considerable emotional up- 
set, She is rated poor on our cashiering test, and we 
do not believe she should be kept at this work. She 
has a good personality for sales, and if her mental con- 
flicts can be successfully dealt with, she ought to make 
a good sales clerk. 

Social Findings: Mother dead; father married sec- 
ond time. Home situation unhappy. Had to leave 
home and is boarding with a private family. Said that 
she was forced out of her own home. Has to support 
herself. Has had two years of high school. This is 
her first job. 

Conclusions: This girl will never make a successful 
cashier. She is too slow in motor speed, her dexterity 
is poor and is not accurate by nature. She should suc- 
ceed fairly well at sales. Transfer to sales is recom- 
mended. 

Follow-up: Girl was transferred to house furnish- 
ings as sales clerk, where she did well, being rated as 
satisfactory by the head of the department, who ex- 
pressed himself as pleased with the new clerk several 
months later. Inasmuch as she did not like it in the 
basement, she was transferred, after a period, to a de- 
partment on the first floor, where she did fairly well. 
She has been under the supervision of the psychiatrist, 
and has improved considerably in her personality dis- 
order. Her health is excellent, her weight has improved 
under a careful dietary regime, and her sales are fairly 
good. The average selling cost of her department is 
4.35 per cent, while this employee’s selling cost is 4.29 
per cent, which means she is under the average cost of 
her department. Her interest in stock has been marked. 
She has proved reliable in stock work and gradually has 
taken on responsibility. Her contact with other girls 
has been good and she is now being tried out for head 
of stock — a promotion for which there is considerable 
competition. The lesson to be learned in this case is 
one of suitable placement according to the ability and 
personality make-up of the employee. 



318 Measurement in Employee Control 

In concluding this chapter, we again quote from Bing- 
ham his closing remarks to industrial explorers, as 
follows: 

We have skirted along th.e margin of territory as yet 
almost unexplored. Some of those who have hereto- 
fore prospected in these regions have already struck 
pay dirt. The means and instruments for digging out 
the treasure and refining it are at hand. We have our 
ideals of measurement, and the techniques for finding 
out how the variables measured are related to each 
other. Some of the ore does not even need to be dug 
out, because it has already been mined by time-study 
men, and wage clerks. 

Many of you may be supervisors, foremen, or super- 
intendents, face to face with practical problems in hu- 
man engineering. If you are, don^t overlook these data 
about individual differences among workers. Remem- 
ber your ideals and techniques of scientific method. 

As engineers you deal with machines, materials, money, 
and men. In the past, too many engineers have made 
the mistake of utilizing science only when dealing with 
machines or materials or money. Men, too, can be 
measured.^ 

4 Bingham, W. V., ‘The Science of Work,” Technology Review, July 
1932, p. 393. 



Part VI 

MEASUREMENT OF THE MORE GENERAL 
TRAITS OF PERSONALITY 




CHAPTER XIX 


The Nature of Measurement 
of Personality 


W E USUALLY think of an individual’s person- 
ality as expressive of his effectiveness in social 
relationships. This aspect of his make-up seems the all- 
important one in determining how well he will get along 
with his fellow beings; in deciding whether he will be 
aggressive enough to make the best utilization of his 
abilities and capacities; in determining to what extent 
he will impress and influence others; and in determining 
how he will meet the situation in an indefinite number of 
other ways. Allport has defined personality as the in- 
dividual’s characteristic reactions to social stimuli and 
his adaptation to the social features of his environment. 
He has analyzed these reactions and has included as the 
factors of prime importance to one’s personality: (1) 
intelligence, or general adaptability; (2) motility, or 
speed of one’s reactions; (3) temperament, or one’s emo- 
tions, moods, and attitudes; (4) self-expression, or one’s 
manner of personal adjustment in a social world; and 
(5) sociality, or one’s social participation and manner of 
one’s solution of social problems. The terra personality 
testing has generally been limited to the last three of 
these, since there are more or less distinct problems of 
quantitative measurement setting off these three from 
the more easily measured first two. “Personality tests,” 
then, include tests for a large number of non-intellectual 

321 



322 The Measurement of Personality 

traits rather loosely grouped under the heading of per- 
sonality. 

The work in the field of personality measurements has 
grown largely out of testing in the more objective fields 
of mental ability and achievement. The many instances 
of inability to predict with accuracy school performance, 
vocational success, or other behavior on the basis of 
measurements of intellect and knowledge, have empha- 
sized the need for quantitative measurements of personal 
attributes which enter into performance or success. Be- 
cause of their success with quantitative measurement of 
intelligence and various aspects of achievement, psy- 
chologists have been encouraged to extend their attempts 
and to try similar methods for measuring the more gen- 
eral personality traits. 

I. The Difficulties of Measuring Personality Traits 

The quantitative estimating of personality traits, 
however, has lagged behind the objective measurement 
of mental ability and school achievement. Even today 
we must still regard measurement of personality traits 
as in an early experimental stage. There are several 
reasons for the lag, among which may be mentioned the 
following: 

(a) Complexity of the thing to be measured. There 
is no denying that personality is a complex entity, and 
that its complexity makes its measurement more difficult 
than the measurement of some simple part of our make- 
up. And yet, we might say, complexity does not prevent 
the quantitative study of human traits. Intelligence is 
measured with more than a fair degree of accuracy and 
it is certainly a complex quality. The difference is that 
psychologists and those who utilize measurements of in- 



The Measurement of Personality 


323 


telligence have come to some general agreement as to 
what general mental ability means and what part intel- 
lectual traits play in behavior. There is no such agree- 
ment as to personality and personality traits. Perhaps 
such an agreement can never be reached. Perhaps the 
separate personality traits do not summate' into a whole 
concept of personality in the way that intellectual traits 
seem to summate into general intelligence. Or perhaps 
it is that the various separate elements of general intel- 
ligence are so highly correlated that sampling a few 
gives an indication of the whole; whereas the various 
elements of personality are not so correlated that a mod- 
erate sampling of the separate elements can give a valu- 
able total measurement. 

(b) Difficulty in setting up problems and tasks. It is 
difficult to set up means of arriving at satisfactory meas- 
ures of a personality trait once it has been defined. It 
is quite feasible without great difficulty to set up sample 
tasks in which the extent of a person’s information about 
a given subject is measured, or his ability to comprehend 
printed material of a given difficulty is estimated. But 
it is another matter to set up sample tasks by which a 
person’s aggressiveness, or his leadership ability, or his 
morality will be measured. Even if we set up and stand- 
ardize sample situations for a test, the very fact of their 
experimental nature may defeat our whole purpose. If, 
as has often been attempted, our set-up is a problem 
situation on paper, the situation is hypothetical instead 
of actual, and the answer of the individual taking the 
test is likely to be hypothetical instead of actual also. 
Social behavior is to a large extent a. matter of habit ; a 
person may know the correct thing to do and so express 
it on a pencil-and-paper test, but fail to act accordingly 
in the actual situation. 



824 The Measurement of Personality 

(c) Dijficulty in evaluating responses to test questions 
and problems. Establishing standards as to correctness 
of responses to social problems is not so easy as establish- 
ing standards of correctness as to meaning of vocabulary 
words, as to the solution of arithmetic problems, or as to 
memory of something one has just read. Accuracy of 
measurement of personality traits suffers from the neces- 
sity of depending, in too many instances, upon subjective 
judgments in evaluating the responses to test problems. 

(d) Lack of criteria for evaluating our methods of 
measuring. No test or instrument of measurement for 
human traits can be used with confidence until its valid- 
ity has been demonstrated. Investigations into the 
validity of measurements of personality traits almost 
always meet a great obstacle in the difficulty of securing 
independent measures of the traits to be studied. We 
may have an objective test that purports to measure 
one’s ability to make judgments in social situations, but 
how are we to demonstrate that the test actually meas- 
ures such ability? Where are we to secure a measure 
of such ability, independent of our test, with which to 
check the test results? No objective production record 
of social judgments on a large number of persons is avail- 
able. No one form of behavior or social participation 
can be picked out that depends upon ability to make 
judgments in social situations alone. If we utilize the 
estimates of others as to the ability of our test subjects 
to make social judgments, the estimates are subjective 
opinions and are almost certain to be somewhat unreli- 
able. At best, the validation of our test is likely to be 
somewhat inconclusive. 

When all of these diflBculties and problems have been 
pointed out, the reader should perhaps be warned against 
too dark an outlook toward the quantitative study of 



The Measurement of Personality 325 

personality traits. Much has been accomplished, and 
with a knowledge of the limitations of our present meas- 
uring instruments, much of value can be obtained by 
them. 


II. Methods of Personality Testing 

Three methods have been commonly used in the con- 
struction of devices for measuring personality traits: 
(1) the Rating Scale, (2) the Questionnaire, and (3) the 
Objective Test. The last of these we have met often in 
our discussion of mental and achievement tests. The 
other two are more common in the field of personality 
testing because of the difficulties of adapting the objec- 
tive test to this type of measurement. 

1. The rating scale. The rating scale furnishes a 
means of securing and recording quantitative estimates 
of the amount or degree to which an individual possesses 
the traits to be rated. The ratings represent subjective 
impressions of some judge. Usually the judge is another 
person who is in position to know the individual being 
rated, although “self-ratings” are sometimes made. 
There have been various types of rating scales developed, 
among which are: (o) the Man-to-Man Rating Scale; 
(6) various types of Numerical Rating Scales; (c) the 
Descriptive Rating Scale; and (d) the Graphic Rating 
Scale. 

(a) The Man-to-Man Rating Scale is of importance 
primarily from a historical standpoint, being seldom used 
now. It was developed during the World War and was 
widely used in personal ratings made in the Army. In 
accordance with this method, the one who is to rate a 
group of individuals in a certain trait first sets up a scale 
of variation in the trait, each degree in the scale being 
represented by a man recognized to possess that degree 



326 The Measurement of Personality 

of the trait. For example, in assigning ratings of a trait 
in the Army, the rating officer was furnished with a 
blank containing the definition of the trait and an out- 
line as follows: 


Highest 15 

High 12 

Middle 9 

Low ... 6 

Lowest .3 


On the first blank line after the word “highest,” the 
rating officer wrote in the name of a man who, in his 
mind, best represented the highest amount of the trait 
to be rated. In a similar fashion, he selected a man 
well known to him who represented each of the degrees. 
Then those whom the officer had to rate were rated by 
direct comparison with the key men. This man-to-man 
scale has the advantages of being concrete and easily 
understood. It has the disadvantages of being cumber- 
some, of consuming too much time in its construction 
and application, of allowing for too great discrepancies 
due to differences in selection of key men by different 
raters. 

(b) Numerical Rating Scales are those in which per- 
centage or point values are assigned as quantitative esti- 
mates of the degree of possession of a trait. For ex- 
ample, a.judge may rate an individual in cooperativeness 
by assigning a rating of 100 per cent, 90 per cent, or 80 
per cent for relatively high degrees of cooperativeness; 
and 30 per cent, 20 per cent, or 10 per cent for low de- 
grees. The scale may have a range from 0 to 10 or from 
0 to 6, or any other numerical range instead of being in 
percentage terms. Such a method of rating is simple, 
but unless used in conjunction with another method 
usually fails to yield accurate enough results. Such 



The Measurement of Personality 827 

methods rarely yield comparable enough results from 
one rater to another, since different raters seldom agree 
upon just how much 50 per cent, or 5 points, represents. 

(c) The Descriptive Rating Scale consists essentially 
of a collection of descriptive phrases; those are checked 
that best describe the characteristics or traits of the in- 
dividual being rated. The descriptive phrases may be 
arranged under certain traits, the four or five descriptions 
representing the degrees of the trait ; or there may be no 
such arrangement. The former arrangement is illus- 
trated by the following phrases representing various de- 
grees of a certain aspect of social behavior : 

1. Extremely breezy and informal. 

2. Cordial and congenial. 

3. Meets one half-way. 

4. Somewhat reserved. 

5. Constrained and formal. 

The Probst Rating Scale * for employees is an example 
of the latter type. Items 15 to 20 in this scale of about 
100 items illustrate the arrangement: 

15. Might often be more considerate. 

16. Usually pleasant and cheerful. 

17. Always courteous. 

18. Cranky disposition. 

19. Often seems dissatisfied. 

20. Often grumbling or complaining. 

Quantitative measurements are usually obtained by 
assigning numerical values to the various phrases which 
may be checked. The rating itself, however, should be 
given without reference to these numerical assignments. 
The descriptive scales have the advantages of being 


1 See Chapter XIV for discussion of this scale. 



828 The Measurement of Personality 

fairly concrete and of avoiding many of the difficulties 
of the numerical scales. 

(d) In Graphic Rating Scales each trait to be rated 
is represented by a straight line signifying the range of 
degree of possession of the trait. One end of the line 
represents the least amount of the trait; the other end, 
the greatest amount. A person is rated by making a 
check along this line at a point which corresponds to his 
degree of possession of the trait being rated. The 
graphic rating scale is usually combined with the descrip- 
tive type, short descriptions of the degrees of the trait 
being placed under the rating line. Numerical scoring 
is done by assigning values to the different divisions on 
the line. The following examples from the Personality 
Rating Scale * of the American Council on Education for 
rating students illustrate the graphic rating technique: 

No Oppor- 
tunity to 
Observe 

A. How are you and others affected by his appear- 
ance and manner? 


Avoided Tolerated Liked by Well liked Sought 

by others by others others by others by others 

B. Does he need constant prodding or does he go 
ahead with his work without being told? 


Needs much 
prodding in 
doing 
ordinary 
assignments 


Needs 

occasional 

prodding 


Does Completes Seeks and 

ordinary suggested sets for 

assignments supplemen- himself ad- 

of his own tary work ditional 

accord tasks 


^Personality Rating Scale, Committee on Personality Measurement, 
American Council on Education, Washington, D. C., 1929. 


The Measurement of Personality 


329 


No Ovpor^ 
tunity to 
Observe 

C. Does he get others to do what he wishes? 


Probably l^ets Soniotlmos Sometimes Displays marked 

unable to others leads in leads in ability to lead 

lead hig take minor Important hig fellows; 

fellows lead affairs affairs makes thinjrs ro 

This type of rating scale is probably the most used 
today. It is easily understood, and is easily and quickly 
filled out. It frees the rater from quantitative terms 
and yet is capable of yielding a quantitative result with 
varying degrees of fineness of discrimination. It usually 
yields fairly comparable ratings when different raters’ 
estimates are to be combined or used together. 

2. The Questionnaire. As is true of the rating scale, 
the questionnaire is a technique commonly used in ob- 
taining quantitative estimates of traits not adaptable to 
direct testing. The chief difference is that the question- 
naire represents a systematic report of attitudes, beliefs, 
reactions, interests, etc. ; whereas the rating scale repre- 
sents estimates or judgments about the possession of a 
trait or characteristic. The two sometimes overlap, 
there being some personality “blanks” which are hard to 
classify. 

Personality questionnaires have been arranged in 
various forms. They sometimes consist of a list of ques- 
tions to which direct answers are to be given. More 
often they consist of items or statements to be marked 
“Yes” or “No”; “Like,” “Dislike,” or “Indifferent to”; 
or to be marked in some similar fashion to indicate the 
subject’s reaction to the items. Still others require that 
the person answering the questionnaire check the items 
applying to him, describing him or expressing his beliefs. 
Examples from some personality questionnaires follow: 




330 


The Measurement of Personality 

From: WILLOUGHBY EMOTIONAL 
MATURITY SCALE « 

(The person taking the test checks a statement if it 
describes him, or if it describes the person whose char- 
acteristics he is rating. S is the subject being rated.) 

1. S is ordinarily friendly toward members of his 

immediate social group, but in critical periods 
becomes irritable or hostile 

2. S is extremely solicitous of his immediate 

family associates .. 

3. S makes his plans with objective reference to 

his own death when this issue is involved, and 
has no emotional reaction greater than that, 
for instance, concerned in planning with ref- 
erence to a long journey 

4. S is meticulous in matters of dress; a con- 
siderable part of his income may be spent in 
this activity, even though strict economies 

are thereby necessitated elsewhere 

From: THE HUMM-WADSWORTH 
TEMPERAMENT SCALE ^ 

(The person taking the questionnaire underscores 


“yes” or “no” to each question.) 

1. Do you like to meet people and make 

new friends? Yes No 

5. Does noise readily waken you from 

sleep? Yes No 

14. Have you several times been unjustly 

punished? Yes No 

28. Has more than one person called you 

hot-headed? Yes No 

189. Are there very few people in whom you 

can confide your troubles? Yes No 


3 Willoughby, R. R., Emotional Maturity Scale, Stanford University 
Press, Stanford, Calif., 1931. Quoted by permission. 

•^Humm, D. G., and Wadsworth, G. W., ^The Humm- Wadsworth 
Temperament Scale,” American Journal of Psychiatry, Vol. 92, No. 1, 
July 1935. Quoted by permission. 



331 


The Measurement of Personality 

From: A-S REACTION STUDY, by 
G. W. & F. H. Allport 

(The person answering checks one of the choices 
given after each question.) 

1. A salesman takes manifest trouble to show you a 
quantity of merchandise; you arc not entirely suited; 
do you find it difficult to say “No’7 

yes, as a rule 

sometimes 


2. Some possession of yours is being worked upon at a 
repair shop. You call for it at tlic time appointed, 
but the repair man informs you that he has “only 
just begun to work on it.” Is your customary re- 
action 

to upbraid him 

to express dissatisfaction mildly 

to smother your feelings entirely 

3. When you are served a tough steak, a piece of un- 
ripe melon, or any other inferior dish at a high-class 
restaurant, do you complain about it to the waiter? 

occasionally 

seldom 

never 

The items of a questionnaire are usually included only 
after considerable experimentation with items con- 
structed to find out about the trait or traits to be meas- 
ured. If items are to be assembled to measure the degree 
of possession of Extrovert-Introvert Tendencies, then ex- 
perimentation should demonstrate a reliable difference 
between answers of introverts and extroverts before the 

® Allport, G. W. & F. H., A Scale for Measuring Ascendance-Submis- 
sion in Personality, Houghton Mifflin Co., Boston, 1928, Quoted by 
permission. 



832 The Measurement of Personality 

items are included. If questions are to be used to dis- 
cover neurotic trends, then it should be demonstrated 
prior to the final use of the questions that their answers 
actually differentiate neurotics from normals. This vali- 
dation of items or questions involves many of the diffi- 
culties of validation of personality tests in general, but 
without it the questionnaires could be considered of 
little value. 

It is customary to assign numerical values to answers 
on a questionnaire in accordance with the demonstrated 
worth of the answers in indicating the trait to be meas- 
ured. Thus a total questionnaire may give a score repre- 
senting degree of possession of introversion, submission 
tendencies, neurotic trends, or emotional maturity. 

3. Objective tests. By this method personality is 
measured by the individual’s actual performance on tasks 
or problems set before him in the test. The method is 
the method of psychological testing met almost univer- 
sally in measuring intelligence and academic achieve- 
ment. The Social Intelligence Test described in Chapter 
XX and the performance tests of the Honesty and 
Trustworthiness Series of Hartshorne and May described 
in Chapter XXII are good examples. Most of the in- 
struments for measuring personality that are in the form 
of objective tests present the difficulties of point (b) of 
our discussion of general difficulties in measuring per- 
sonality. 

Relative values of personality test techniques. The 
rating and questionnaire methods, as compared with 
objective tests, usually lack reliability. By these meth- 
ods we seldom find exactly the same results when the 
same judge rates the same group at different times; and 
two or more judges rarely show agreement in their esti- 



The Measurement of Personality 333 

mates of the same individuals. However, much of the 
potential unreliability of these methods can be avoided 
by carefully constructed scales, by conscientious, trained 
judges or raters, and by combining ratings of several 
judges. Under favorable conditions, the reliability of 
these methods, while below that of standardized intelli- 
gence tests, is still high enough to warrant their con- 
tinued use. This is especially true in the face of the 
impossibility of applying objective test methods to all 
the traits for which measurements are needed in the 
course of dealing with human problems. 

When the faetors which reduce the reliability of ratings 
and questionnaire estimates are considered, the validity 
correlations for these methods seem fairly satisfactory — 
indeed, as satisfactory as the validity correlations usually 
obtained for objective personality tests. No technique 
of personality measurement has yet proved thoroughly 
valid. The best that we can do at the present stage of 
testing in this field is to accept the best of the devices of 
all types, wherever possible supplementing the one by 
the other. 



CHAPTER XX 


The Measurement of Social Attributes 

I N THIS chapter we shall consider certain instru- 
ments of measurement which have been worked out 
for estimating degree of ability to get along with people 
or to adjust oneself in a social world — a quality often 
spoken of as “social intelligence.” 

The value of the ability to live and to work harmoni- 
ously with one’s fellow-beings, or social intelligence, 
hardly needs emphasizing. Lack of this quality partly 
accounts for many failures where abstract intellectual 
ability is sufficient or over-sufficient. Capacity to ap- 
preciate the feelings and attitudes of others and to carry 
on satisfactory relationships with them is absolutely es- 
sential in many of life’s occupations. Positions that in- 
volve personal contacts require no qualification more 
important than the power to analyze social situations, to 
appreciate the forces that govern the behavior of people, 
to deal with subordinates, and to understand and to exe- 
cute the policies of superiors. 

The man is conspicuously lacking in these qualities 
who builds up a friendship and, as host, offends his 
friend; or who heads his college class in scholastic honors 
only to fail in the practical world; or who laboriously 
trains an apprentice and, because of a thoughtless out- 
burst of temper, loses him at the beginning of his period 
of real usefulness. 

One of the measurements that we shall discuss has been 

334 



Measurement of Social Attributes 


385 


developed for indicating social intelligence at a relatively 
high level of general ability; the other has been devel- 
oped for measuring it over a wider range. 

I. The Moss-Hunt-Omwake Social Intelligence Test 

1. The factors tested. The separate parts of this test 
are based on the elements entering into the ability to get 
along with other people. Parts included in the various 
forms of the test have covered seven factors of impor- 
tance in social intelligence. A discussion of these factors 
will indicate the purposes of the test. 

One of the most important elements in social intelli- 
gence is the ability to remember names and faces. The 
person who gets along best with others does not have to 
be introduced to a man many times before he remembers 
that he has met him before. If he is a salesman, he is 
able to address the prospective customer by name if he 
calls at the place of business a second time. The pleas- 
urable reaction in one who is recognized and called by 
name is realized by the politician, who attempts to call 
every doubtful voter by name, and by the traveling sales- 
man, who learns the buyer’s name before calling upon 
him. 

There are two factors to be tested in this ability. One 
is recognition of the face that has been seen before ; the 
other is linking the name with the face. The two ordi- 
narily exist together, but it is not unusual to find a per- 
son who readily remembers faces, but experiences diflS- 
culty in remembering names. Results of use of the test 
indicate, as would be expected, that it is easier to pick 
out from a large number of faces those that have been 
seen before, than it is to link the proper names with 
the faces. 

Another factor in social intelligence is the ability to 



336 


Measurement of Social Attributes 


recognize from facial expressions the attitudes, states of 
mind, or emotions in others. The first requirement for 
successful dealing with others is knowledge of how the 
other is reacting or how he feels. One who is unable to 
tell by a person’s expressions whether he is indifferent, 
mildly interested, or greatly interested, is hard put to 
select the most effective means of clearing up a difficult 
situation. One who cannot distinguish between expres- 
sions of disgust and anger may fail to choose the appro- 
priate word at a critical moment. In short, this factor 
in social intelligence involves the ability to recognize 
from the facial expression the mental state or attitude 
behind it. 

Closely related to the trait just described is the ability 
to judge the state of mind or motive behind words. It 
is clear to everyone that words of anger and words of ad- 
miration are vastly different. But very much finer dis- 
tinctions between spoken expressions of mental states 
can be made. The finer the distinctions a person can 
make and the quicker he is at discerning the motive be- 
hind words, the better the words of others are as a guide 
to his social relations. Since words are one of the most 
widespread means of expressing thoughts and motives, it 
is not difficult to see why a person who is unusually suc- 
cessful in his relations with others is likely to be one who 
can see farthest behind the words. 

Comprehension of social situations and judgment in 
solving the situations are of extreme importance. The 
motives or causes producing situations should be clear to 
the persons involved, and the methods of solving social 
problems must be available. The correct solution of a 
problem may be vastly different, according to whether 
the situation was produced in relations of friendship and 
pleasant motives or of enmity and unpleasant motives. 



Measurement of Social Attributes 


337 


Of the various number of actions possible in a given situ- 
ation there is perhaps one which would produce absolute 
discord, another which would leave unfriendly feelings, 
another which would answer satisfactorily but leave an 
attitude of indifference, and another which would answer 
just as satisfactorily and in addition build up pleasant 
relationships. With these possibilities in mind, the per- 
son who has the most social intelligence chooses the last. 

Ability to deal with people is reflected through one’s 
knowledge of human nature as obtained by observations 
of people under various circumstances. Everyone has 
had opportunities to observe human nature; to find out 
that certain tactics succeed while others fail. Thus ob- 
servation of human behavior has been included as a 
factor in the Social Intelligence Test. The one who is 
most adept in getting along with people will be, in gen- 
eral, the one who is most interested in people and has 
most accurately observed their behavior. 

Breadth of knowledge, or diversity of interests, is an- 
other important factor. Dealing with people success- 
fully means being interested in the things in which they 
are interested. The more things with which a person is 
familiar, the more likely he is to have something in com- 
mon with others and therefore be better able to get along 
with them. Since fields of interest are so varying, the 
person with a knowledge of the greatest number of fields 
can appeal to the greatest number of people. The per- 
son with a high degree of social intelligence is not at loss 
for words if the conversation happens to be outside of 
his own limited vocation. Diversified interests, more- 
over, are valuable not only in themselves but also in- 
directly in many of the factors mentioned before: judg- 
ment, for example, is limited by interest and knowledge. 

Lastly, sense of humor seems related to social intelli- 



338 


Measurement of Social Attributes 


gence. One devoid of such a sense lacks an important 
point of contact with his fellow beings. 

2. Sample social intelligence test items. The various 
parts of the test have been prepared to measure the 
above-described qualities in such a fashion as to be suit- 
able for group testing. Samples of various of the parts 
are presented from the revised form of the test. 

1. JUDGMENT IN SOCIAL SITUATIONS 

Directions. Four answers are suggested for each of 
• the following questions. Make a check (V) in the 
space in front of the answer which you consider to be 
most nearly correct. Do not check more than one 
answer. If you do, your w’ork on that question will 
not be counted. 

1. A man who has been a traveling salesman for fifteen 
years decides, under pressure from his family, that 
he will stay in one place and is transferred to the 
general oflSce of his company. You would expect 
him to: 

Like the office work because it is restful. 

Become restless under office routine. 

Seek a position with another firm. 

Be very inefficient in his office work. 

2. You wish to ask a favor of an acquaintance whom 
you do not know very well. The best way to ask 
him would be to: 

Try to impress upon him that he is the one 

who will benefit. 

Tell him how greatly he can benefit you if 

he does it. 

Offer to do something for him in return. 

Ask him, briefly stating yoiu- reasons. 

3. Assume that you are a girl and that you meet an 
older woman on the street, a very slight acquaint- 



Measurement of Social Attributes 


389 


ance, whose eyes show evidence of weeping. It 
would be best to; 

Ask her why she is sad. 

Put your arm around her in a consoling man- 
ner. 

Appear not to notice her distress. 

Appear not to see her at all. 

4. Suppose you are president of a community center 
organized for improving community conditions. 
Meetings for the last three months have been poorly 
attended. The best way to bring more citizens to 
the next meeting would be to: 

Visit some of the prominent citizens and lay 

some of the new problems before them. 
Advertise an interesting program for an eve- 
ning meeting. 

Post notices of the meeting in all public 

buildings. 

Send a personal notice of the meeting to all 

members. 

5. Smith, an efficient employee, but one who thinks he 
‘^knows it all,” has for several days been criticizing 
Jones^ method of doing a particular piece of work 
they are doing. The boss later suggests that all the 
workers follow Jones^ method, since it saves time. 
Smith will most likely: 

Ask the boss for another job. 

Keep on doing it his own way without com- 
ment. 

Follow Jones^ method but continue to criti- 
cize it. 

Follow Jones’ method but purposely work 

less efficiently. 

6. Suppose you are a junior clerk in a large office. The 
chief of your section enters your room while you are 
reading a newspaper when you should be working. 
The best way out of 'the situation would be to: 



340 


Measurement of Social Attributes 


Continue reading the newspaper and show 

no embarrassment. 

Fold it up and return to your duties. 

Appear to be making news clippings relative 

to your work. 

Try to interest the boss by reading aloud an 

important headline. 

2. RECOGNITION OF THE MENTAL STATE OF 


THE SPEAKER 

1. Ambition 

10. Hypocrisy 

2. Admiration 

11. Indecision 

3. Despair 

12. Jealousy 

4. Determination 

13. Loneliness 

5. Disappointment 

14. Love 

6. Disgust 

15. Rage 

7. Envy 

16. Regret 

8. Fear 

17. Scorn 

9, Hate 

18. Suspicion 


Directions. In the parentheses before each of the fol- 
lowing quotations write the number of the word from the 
list above which most accurately describes the mental 
state of the person making the statement. Some of the 
mental states in the list may not be represented at all 
below, and some may be represented more than once. 

( ) No one is able to stop me : I will do that which I 

intended to do or die in the attempt. 

( ) There is something in the way he deals that makes 

me want to cut the cards. 

( ) A glance from your eyes, a touch of your hand, 

and the gates of paradise swing wide for me. 

( ) Drink as much wine as you please but preach the 

benefits of water. 

( ) The idea of asking those Baileys! They wouldn't 

even know a reception from a strawberry festival. 
( ) I wish I had your opportunity. Things are always 

handed to you on a silver platter, but I never get 
a chance to do anything. 



Measurement of Social Attributes 


341 


{ ) Now could I drink hot blood and do such bitter 

business as the day would quake to look on. 

3. OBSERVATION OF HUMAN BEHAVIOR 

Directions. If the statement is true, encircle the T\ 

if it is false, encircle the F. 

T F 1. In pleasure the corners of the mouth are 
pulled down. 

T F 2. Pretense and sham are often inspired by the 
desire for social admiration. 

T F 3. Most people tend to imitate those whom they 
admire. 

T F 4. It is easier to remember to wind an eight- 
day clock than one that must be wound every 
day. 

T F 5. All men are created equal in mental ability. 

T F 6. AVe arc more shocked by our errors in eti- 
quette than by those in logic. 

T F 7. In fear there is a tendency for the eyes to 
become more widely opened. 

T F 8. As a rule we should place little confidence in 
those who appear to love us extremely on a 
slight acquaintance. 

T F 9. A person of strong character usually makes 
firm friends and bitter enemies. 

T F 10. For most people, forbidding an act increases 
the pleasure of doing it. 

T F 11. A mother^s estimate of her child is the most 
reliable one. 

T F 12. Good conduct is a reliable indication of high 
intelligence. 

T F 13. The salesman who makes the most sales is 
usually the most popular with the other sales- 
men. 

T F 14. One of the surest methods of bringing a man 
to your point of view is by engaging in argu- 
ment with him. 

T F 15. With the average person there is no more 
pleasing sound than praise of himself. 



STUDY SHEET-MEMORY FOR NAMES AND FACES 

Study each of the twelve faces very carefully and try to remember 
the name that goes with it, for later you will have to recognize these 
faces in a larger group and remember their names. You will have four 
minutes to study this sheet. 




) Mike Bailey. 

) Clifton Clark. 

) George Cook. 

) Tom Edwards. 
) Ben Elliott. 

) Lee Higgins. 


) Howard Jones. 

) Jake McDonald. 
) John Moore. 

) Chester Sims. 

) Sid Smith. 

) Fritz Wagner. 


343 




844 


Measurement of Social Attributes 


4. SENSE OF HUMOR 

Directions. In each of the following, pick out the one 

of the four suggested completions that makes the best 

joke, and write its number on the line at the right. 

1. Cable to Scotchman whose wife had been lost 
at sea after a shipwreck; “Wife’s body 
found. Attached to it a rare fish for which 
the British Museum offers four hundred 
pounds.” Reply: (1) “Accept the offer. 

Reset the bait.” (2) “Sell fish to highest 
bidder.” (3) “No more fish stories.” (4) 

“My wife always said she could catch any- 
thing.” 

2. Physician while taking case history asks, 

“Are you married?” Patient: (1) “Yes, 
but I pay the bills.” (2) “That was twenty 
years ago.” (3) “My wife chooses her own 
doctor.” (4) “No, the reason I look this 

way is because I’m sick.” .... . . 

3. Lady in a lower berth of pullman train, on 
being annoyed by the snoring of man above, 
tapped on the upper wall. Answer from 
above: (1) “Who paid for this upper 
berth?” (2) “All right, close your window.” 

(3) “I don’t like your snoring either.” (4) 

“Sorry, lady, I saw you when you came in.” 

4. The Judge: “Wife desertion, Rastus, is a 
terrible thing, about which I feel very 
strongly and for which I must punish you 

. severely.” Rastus: (1) “I ain’t a deserter, 

I’se a refugee.” (2) “I’se only takin’ a rest 
for my health.” (3) “I won’t desert her 
again.” (4) “There ain’t no justice.” 

6. “Johnny, if you eat more cake you’ll burst.” 

(1) “Why I’ve eaten this much before.” (2) 

“No, I have a tough stomach.” (3) “Then 
I’ll be able to take still more.” (4) “Well, 
pass me some and get. out of the way.” .... 



Measurement of Social Attributes 


345 


6. She: “Since you pride yourself on being able 
to judge a woman’s character by her clothes, 
what would be your verdict of my sister 
over there?” (1) “Short-sighted.” (2) 

“Acquitted.” (3) “Insufficient evidence.” 

(4) “She has an alibi.” 

7. “Eliza,” said a friend of the family to the 
old colored washer woman, “Have you seen 
Miss Edith’s fiance?” (1) “Oh, yes ma’am, 
he sho’ do look like Gary Cooper, don’t he?” 

(2) “No ma’am, he don’t come here ’ceptin’ 

Wednesdays and Saturdays.” (3) “No 
ma’am, it ain’t been in de wash yet.” (4) 

“No, I always borrows money from my 
friends.” . . . 

3. Analyses of the social intelligence test, (a) Its 
reliability. Results obtained in using the Social Intelli- 
gence Test indicate that its reliability is sufficiently high. 
One hundred college sophomores took the test twice, 
four months apart, the test being administered by two 
different persons. Scores on the two different trials cor- 
relate 0.89. For 129 college students in another univer- 
sity, when scores on odd questions are correlated with 
scores on even questions and the Spearman prophecy 
formula (Brown’s) is used for predicting reliability of 
the whole test, the reliability is 0.88. Scores on two dif- 
ferent forms of the test correlate 0.85. 

(6) Its Validity. To know the extent to which the 
test actually measures ability to deal in human relation- 
ships, it is necessary to have some measure of social intel- 
ligence with which to compare scores. As would be 
expected, it has been difficult to find such a measure 
sufficiently objective and reliable to use in correlating 
scores. Since very few quantitative studies of this 
ability have been made, largely because of lack of means 



346 Measurement of Social Attributes 

to measure objectively "adeptness in handling people,” 
adequate measuring devices for establishing a criterion 
have not been easily available. 

One means of obtaining a definite measure of complex 
traits for which it is impossible to obtain test measures 
is through personal ratings by competent persons who 
know the individuals to be rated. In a large sales com- 
pany 98 employees who took the Social Intelligence Test 
were rated by a superior executive who had good op- 
portunity to know their ability to deal with people. 
The ratings on a seven-point scale correlated 0.61 with 
Social Intelligence Test scores. Of those making above- 
average scores on the test, 75 per cent, as rated by the 
judge, were above average in ability to get along with 
people. 

Study has also been made of the relationship between 
test scores and ratings of students, student ratings rep- 
resenting a combination of teacher ratings and sorority 
or fraternity ratings. The study gave positive results, 
but somewhat lower than for the industrial rating study; 
correlations averaged about 0.40. 

Studies of the Social Intelligence Test have also con- 
sidered, as a criterion of the ability to be measured, the 
extent of participation by students in extra-curricular 
activities. In one study of 262 freshmen taking full- 
time college work, the activities listed for study are those 
in which participation is entirely voluntary and aside 
from class work, including the following: athletics, dra- 
matics, literary society, class offices, student publications, 
debate, glee club, fraternities or sororities, and participa- 
tion in social functions. The assumption has been that 
extra-curricular activities are an indication of the socia- 
bility of the student or the skill with which he deals with 
his fellow students. This has been considered a safe 



Measurement of Social Attributes 847 

assumption, since extra-curricular activities depend di- 
rectly upon association and dealing with fellow students. 
The student who is unable to adjust himself to the ac- 
tions of others, or who does not like to engage in pursuits 
where he must manage relations with others, is an infre- 
quent participant in campus activities. 

Table XXIV 

RELATIONSHIP OF SOCIAL INTELLIGENCE TEST SCORES TO 
NUMBER OF EXTRA-CURRICULAR ACTIVITIES i 


Number of Activities 

Lower Quarter 
Point 

Median 

Upper Quarter 
Point 

Four or more . , 

105 

116 

125 

Three 

97 

112 

121 

Two 

101 

110 

124 

One 

92 

105 

117 

None 

83 

99 

113 


Of the 262 freshman students, 90 engaged in no extra- 
curricular activities, 59 in one, 62 in two, 31 in three, and 
20 in four or more. Table XXIV gives the scores on 
the Social Intelligence Test for the different groups. 

On another occasion, a group of upper-class students 
was studied. Activity scores were calculated for the 
students by assigning numerical values to various types 
of participation as indicated by yearbook and school 
records, and by answers to questionnaires filled in by the 
students. Activity scores ranged from 0 to 50. The 
total scores on the Social Intelligence Test show the dif- 
ferentiations indicated in Table XXV. 

Such studies have been interpreted as indicating that 
the test is measuring something of significance in deter- 
mining participation in campus organizations and activ- 
ities. Equal results are not obtained with other meaa- 

1 Hunt, T., “The Measurement of Social Intelligence,” The Journal of 
Applied Psychology, Vol. XII, No. 3, June 1928, 



848 Measurement of Social Attributes 

ures of student ability, as abstract intelligence tests, 
grades, and scholastic honors. 


Table XXV 


RELATIONSHIP OP SOCIAL INTELLIGENCE TEST 
SCORES TO ACTIVITY SCORES ‘ 


Activity Score 


Social Intelligence 
Test A uerage 


30 or over 128 
20 to 29 . . 125 
15 to 19 117 
10 to 14 . 118 
5 to 9 118 
1 to 4 114 
0 . 113 


4. Relationships of social intelligence as measured by 
the test, (a) To other types of intelligence. Studies 
of the relationship between social intelligence and ab- 
stract intelligence and concrete intelligence may throw 
some light upon the distinctness with which we can speak 
of the three types of intelligence included in Thorndike’s 
three-fold division. Such studies may also throw some 
light upon the extent to which we can measure social 
intelligence as distinguished from the other two types, 
particularly as distinguished from abstract intelligence. 
It has frequently been suggested by critics of the Social 
Intelligence Test which we have discussed in this chap- 
ter that the test measures too large an element of abstract 
intelligence, probably mainly through its verbal medium 
of testing. The correlations between the Social Intelli- 
gence Test and abstract intelligence tests may help to 
answer this criticism. The literature contains reports 
on eight studies of the relationship between the Social 
Intelligence Test and abstract intelligence tests, all these 


2 MosS) F. A., “Preliminary Report of a Study of Social Intelligence 
and Executive Ability/^ Pyblic Personnel Studies, Vol. IX, No. 1, 1931. 



Measurement of Social Attributes 349 

studies being done on college students. These are sum- 
marized in Table XXVI. The average of these correla- 
tions is .49. 


Table XXVI 


CORRELATIONS BETWEEN SOCIAL AND ABSTRACT 
INTELLIGENCE 


Author 

Tests Correlated Correlation 

No, of Cases 

Broom 

. . Thorndike Test w. 
Social Test 

.56 

258 

Hunt 

. . Mental Alertness Test 
w. Social Test 

.54 

243 

Hunt 

• O’Rourke Test w. 
Social Test 

.57 

102 

Hunt 

. . McCall Multi-Mental 
Test w. Social Test 

.25 

130 

Pintncr and Upshall. . 

. . Group Verbal Test 
w. Social test 

.68 

33 

Garrett and Kellogg . 

. . Thorndike Test w. 
Social Test 

.42 

118 

Strang 

. . Group Verbal Test 
w. Social Test 

.44 

311 


Two studies of relationship between Social Intelligence 
Test scores and mechanical intelligence scores have been 
reported. One of these studies utilizes the O’Rourke Me- 
chanical Aptitude Test, the correlation being .22: the 
other utilizes the MacQuarrie Mechanical Ingenuity Test, 
and the correlation is .11. 

In order to draw any conclusions about the relation- 
ships between the three types of intelligence, we need 
also to know something of the relationship between ab- 
stract intelligence and mechanical intelligence, and some- 
thing of the relationship between different tests designed 
to measure the same one of the three divisions. The 
average of numerous correlations reported between ab- 
stract and concrete intelligence, as measured by the avail- 
able tests, is about .60. The correlations between various 
abstract intelligence tests average about .76 to .80. Rela- 



850 Measurement of Social Attributes 

lively few studies have been made correlating the various 
mechanical or concrete tests, and since only one or two 
attempts have been made to measure social intelligence, 
practically no studies are available correlating social in- 
telligence as measured by several different tests. If we 
can take reliability studies of the social intelligence tests 
as indicative of their correlations with themselves, we can 
say that these correlations also would be around .75 
to .80. 

From the above studies, we may conclude that the three 
types of intelligence as designated by Thorndike can be 
somewhat differentiated, although our tests for measur- 
ing the three undoubtedly measure a large element of a 
common factor, which common factor seems to be in the 
main abstract intelligence. In other words, the great 
majority of tests of the pencil-and-paper type measure an 
element of abstract intelligence in addition to any other 
separate element which they may be designed to measure. 
It would seem, however, that the concrete intelligence 
tests and the one social intelligence test which we have 
discussed measure something sufficiently different from 
abstract intelligence so that they are of some value in 
our studies of abilities. 

(6) To sex. Studies of the Social Intelligence Test 
have shown in average scores of women a small but re- 
liable difference from scores of men of equal age and edu- 
cational status. The parts of the Social Intelligence Test 
in which women show the greatest superiority are those 
parts measuring Judgment in Social Situations and Ob- 
servation of Human Behavior. 

(c) To occupation. The Social Intelligence Test has 
been given to a considerable number of occupationtd 
groups, and norms have been published for some of these 
groups. The median score attainments of several are 
given in Table XXVII. Studies of the relationship be- 



Measurement of Social Attributes 


851 


tween adequate measures of social intelligence and of oc- 
cupational success should be very valuable in connection 
with the selection of workers for those positions in which 
the ability to get along with others is of supreme impor- 
tance. 


Table XXVII 

MEDIAN SOCIAL INTELLIGENCE SCORES FOR 
VARIOUS OCCUPATION GROUPS* 


Occupation 

Number 

of 

Cases 

Median 

Score 

Administrative and executive employees . . . 

100 

117 

Teachers 

250 

112 

High-grade secretarial employees 

50 

111 

Salesmen 

25 

107 

Engineering employees — flraftsmen, 
electricians, etc. 

45 

105 

Clerical and stenographic employees 

200 

95 

Other lower-grade office workers and helpers 

300 

84 

Sales clerks in department store 

35 

81 

Nurses . 

75 

78 

Lower-grade industrial workers 

150 

65 


II. The Vineland Social Maturity Scale 

Edgar A. Doll, Director of Research at the Training 
School for Feebleminded at Vineland, New Jersey, has 
recently been working on a scale for measuring social 
competence. He believes that social competence is suffi- 
ciently differentiated from intelligence to make its meas- 
urement important in his work with lower-grade individu- 
als. Study and further use of the scale will undoubtedly 
prove its value for application to many problems among 
normal and superior individuals. 

We quote from Doll * his statement of the problem of 
measurement of social competence: 

3 Hunt, T., op. ct7. 

* Doll, Edgar A., “The Measurement of Social Conapetence,” Proceedr 
ings of the Fifty-ninth Annual Session of the American Association on 
Mental Deficiency, Chicago, Illinois, April 25-27, 1935. 



852 


Measurement of Social Attributes 


The attempt to measure social competence immedi- 
ately encounters critical diflSculties. Shall we identify 
social competence with social success as reflected in 
fame or fortune? Shall we define social competence as 
the ability to manage oneself and one’s own affairs inde- 
pendently, or, should we also require some contribution 
to the welfare of others? Social competence is affected 
by intelligence, by personality, by emotionality, by con- 
duct, skill, opportunity, training, experience, and so on. 
But social competence is not to be measured by any of 
these traits alone. Social competence is measured 
rather by the effective social uses to which these traits 
are put. The ultimate significance of any human trait 
is its actual capitalization for social purposes. We 
may even go so far as to say that no behavior is impor- 
tant except as it is socially significant. 

Social competence is also influenced by physical "health 
and well-being, by disabling diseases, by sensory handi- 
caps, invalidism, crippling, or senescence. Environmen- 
tal influences, too, play a role ; for example, the restraints 
of family life, the solicitude of elders, the strength of 
motivation, incentive, and habit, the stimulus of ideals, 
the customs of time and place, the advantages or limita- 
tions of cultural status, and so on. In particular, we 
are obliged to distinguish between sociality and soci- 
ability, meaning by the former social competence and by 
the latter social affability. 

In constructing the Vineland Social Maturity Scale 
we have reckoned with these and other difficulties. The 
extent to which we may have succeeded in resolving 
these complicating influences will be reflected in the prac- 
tical success of this instrument as further investigation 
proceeds. We have made an attempt to measure social 
competence as a composite capitalization of the sum total 
of individual capabilities which is reflected in progressive 
stages of personal independence or freedom from the as- 
sistance and supervision of others. 

1. The Vineland scale. In its present form, the Vine- 
land Social Maturity Scale consists of 117 items, arranged 



Measurement of Social Attributes 353 

in order of their development in normal persons from birth 
to adult life. Each item is designed to indicate social 
competence in terms of the degree to which the individual- 
is capable of looking after himself, getting along with 
others, and contributing to the welfare of others. The 
items of the scale include such general activities as self- 
help, locomotion, occupation, communication, self-direc- 
tion, and socialization. The manual of directions defines, 
explains, and illustrates each item. The scale is meant 
to be not a rating scale but an objective schedule of de- 
velopmental behavior capable of being standardized in 
definite quantitative terms. The information for mak- 
ing a measurement is to be obtained from observation 
and by questioning the subject and others familiar with 
the subject’s behavior. The method of standardization 
being employed by Doll is the Binet method. Results 
are expressed as Social Age and, by dividing by chron- 
ological age, as Social Quotient. Sample items from the 
middle range of the scale are quoted to show the general 
nature of the material. 

Bathes self-assisted 

Uses table knife for cutting 

Uses pencil or crayon for drawing 

Uses skates, sled, wagon 

Goes about neighborhood unattended 

Cares for self at table 

Prints simple words 

Combs or brushes hair 

Plays competitive exercise games 

Does routine household tasks 

Goes to bed unassisted 

Plays simple table games 

Goes to school unattended 

Disavows literal Santa Claus 

Uses pencil for writing 



854 Measurement of Social Attributes 


2. Studies of the scale. The studies of the scale re- 
ported by Doll up to the present are all of a preliminary 
nature. He reports reliability of the scale based upon a 
second application as applied to 30 subjects. Fifteen of 
these subjects, of average social age 8.1, were reexamined 
by the same examiner, with a different informant the sec- 
ond time. The median difference between the two ex- 
aminations was 0 ± .6. The other fifteen subjects were 
reexamined by a second examiner, who used the same 
informant that was used by the first examiner. The re- 
sults showed a median difference of + .2 ± .4. These 
studies on a few subjects would seem to indicate satisfac- 
tory reliability for the measuring device. Validity is less 
satisfactorily indicated in the preliminary reports, but 
the lack of a generally good criterion for validating the 
scale would leave us to await further study before judg- 
ing the scale’s validity. The preliminary results indicate 
a relationship between Social Age and Mental Age, and 
between Social Age and Cultural Status, of the subjects. 

3. Uses of the social maturity scale. Doll suggests for 
his scale the following uses in relation to institutional 
administration. Extension of the uses to other problems 
can be made likewise. Doll’s suggestions follow; 

1. The Scale affords a measure of social competence 
which is useful in establishing the fundamental cri- 
terion of mental deficiency and, consequently, for 
discriminating between the feebleminded and the in- 
tellectually subnormal. 

2. The Scale affords a measure of progressive social 
growth, arrest, or deterioration, and, ipso facto, af- 
fords a schedule by means of which an adequate de- 
velopmental history may be taken to distinguish 
between arrested mental development and deteri- 
orated mental states. 



Measurement of Social Attributes 855 

8. The Scale affords an objective measure of relative 
social competence within a feebleminded group, 
and suggests the extent to which special handicaps 
need or need not be taken into consideration. 

4. The Scale affords a means of measuring improve- 
ment in social ability as brought about by treatment 
or training, and indicates the direction as well as 
the amount of such improvement. 

6. The Scale meets a basic need in mental diagnosis 
which is of critical importance in borderline cases 
and provides a standard measure that may be em- 
ployed for legal purposes, since the legal definitions 
of idiot, imbecile, and moron have always been ex- 
pressed in social terms. 

6. The Scale is an aid in the classification of institu- 
tional inmates in respect to housing groups, training 
groups, and work groups, and as a general criterion 
in the study of social adjustment. 

7. The Scale is particularly useful in considering pa- 
tients for discharge. 

8. The Scale affords a means of discovering unsuspected 
social aptitude of those patients whose unobtrusive 
personality and social withdrawal tendencies render 
them inconspicuous and lead to underestimation of 
their real ability. 

9. The Scale affords a useful device in mental hygiene 
consultation where social capability is likely to be 
a critical factor in relation to social adjustment. 

10. The Scale affords a practical measure of insight that 
should be useful with abnormal patients. 

11. The Scale affords a guide for child training and 
parent education, suggesting the direction in which 
training may be advisable and the limits within 
which success is likely to be achieved in relation to 
ability. 

12. The Scale affords a definite advantage over personal 
methods of estimating social competence. 



856 Measurement of Social Attributes 

13. Finally, the Scale provides a basic criterion for the 
investigation of many different problems in social 
science which have heretofore been complicated by 
lack of a suitable criterion. 



CHAPTER XXI 


The Measurement of Emotions 
by Verbal Tests 


I T IS rather difficult to give an exact definition of “emo- 
tions” as they are measured by the types of tests to be 
considered in this chapter. Generally the tests have to 
do with our tendencies to emotional reactions. The re- 
sults or the scores on the test indicate that we react with 
emotion to more or fewer things than the average person 
of our age and sex does; or they indicate that in us emo- 
tional reactions are provoked by those things which ordi- 
narily provoke emotions, or by peculiar things which 
ordinarily do not call out emotions; or they may indicate 
that we experience more intense or peculiar kinds of emo- 
tional reactions to those things producing emotions. 

We may think of emotions as being tested in two general 
ways — ^by a direct method and by an indirect method. 
The direct method consists of putting the subject to be 
tested in a situation which is calculated to arouse emo- 
tional responses, and then measuring his reaction under 
these conditions. Such measurements have been carried 
out numerous times in the psychological laboratory, but 
the procedures and means of measurement are often cum- 
bersome, time-consuming, and unstandardized, so that 
direct measurements of emotions have not reached a stage 
of general application. From the standpoint of validity 
they probably have much to recommend them over the 

SflT 



858 


Verbal Tests for Emotions 


indirect methods. Chapter XXVI is concerned with 
some of these measurements. 

The indirect method is usually some verbal type. 
Words are used as stimuli, and the individual being meas- 
ured is asked to respond by words. The method may 
investigate the reaction which the individual reports 
that he experiences when he thinks of the word. Or an 
imaginary situation may be set up and the individual may 
be asked to report what his reaction would be in the 
imagined case. The indirect methods usually involve a 
certain amount of introspection and retrospection. The 
examiner does not observe the subject’s reactions directly ; 
he can observe and examine only the reports of reactions. 

I. Adjustment Questionnaires and Tests 

Several scales or tests have been worked out for meas- 
uring emotional adjustments. These usually aim at dis- 
covering abnormalities of reaction, idiosyncrasies or 
peculiarities of emotions, or neurotic traits. The items 
of many of them are simply scored for normality or ab- 
normality of response. Practically all of them have their 
chief value and usefulness in the discovery of abnormali- 
ties or neurotic traits, and are of only limited, if of any, 
usefulness within the normal range of behavior. Exam- 
ples of these are the following: 

1. Personal Data Sheet.^ This is a questionnaire con- 
structed by Woodworth for discovering an individual’s 
psychoneurotic tendencies. It consists of 116 questions 
to be answered by Yes or No, such as: “Does it make you 
uneasy to go into a tunnel or subway?” “Do your interests 
change quiddy?” The score is determined by the mim- 


1 Published by the C. H. Stoelting Company, Chicago, 111, 



Verbal Tests for Emotions 


359 


ber of neurotic or unfavorable answers, the unfavorable 
answer sometimes being Yes, sometimes No. The scale 
is suitable for adults, being originally designed for use 
with soldiers. It is one of the earliest of the adjustment 
scales, and at present there are available a number of re- 
visions based upon Woodworth’s original Data Sheet. 
Ellen Mathews published in 1923 a revision suitable for 
children eight years of age and over. S. D. House pub- 
lished in 1927 a revision for adults, which he called a 
Mental Hygiene Inventory. 

2. Personality Schedule.’’ This is a questionnaire very 
similar in form and scoring to the Woodworth scale. It 
contains a somewhat larger number of questions, 223 in 
all. It is designated for discovering personal and social 
maladjustments, particularly in college students. The 
norms give score ranges for various degrees of maladjust- 
ment from the normal to the definitely maladjusted. 

3. Colgate Mental Hygiene Test.* The test is de- 
signed to detect abnormal emotional trends in adults. It 
has been most used with college students. The test is in 
the graphic scale form; answers to the questions are to be 
indicated by checks along a line beneath which are terms 
or phrases descriptive of the various possible reactions or 
answers. There are three parts to the scale: Part I, con- 
taining questions dealing with psychasthenia; Part II, 
containing questions dealing with schizophrenia (split 
personality) ; and Part III, containing questions dealing 
with neurasthenia. Checks in the “neurotic” end or sec- 
tion of the scale are scored. The total score represents 
the extent of abnormal trends. 


* By Thuratone, L. L. and T. G. Published by University of Chicago 
Press, Chicago, 111. 

"By Laird, Donald. Published by Hamilton Republic, Hamilton, 
N.Y. 



860 


Verbal Tests for Emotions 


4. Personality Inventory/ This test contains 125 ques- 
tions to be answered by Yes, No, or ?. The first 15 ques- 
tions are reproduced here as samples: 


1. 

Yes 

No 

? 

Does it make you uncomfortable to 
be ‘‘different’’ or unconventional? 

2. 

Yes 

No 

? 

Do you day-dream frequently? 

3. 

Yes 

No 

? 

Do you usually work things out for 
yourself rather than get someone to 
show you? 

4. 

Yes 

No 

? 

Have you ever crossed the street to 
avoid meeting some person? 

5. 

Yes 

No 

? 

Can you stand criticism without 
feeling hurt? 

6. 

Yes 

No 

? 

Do you ever give money to beggars? 

7. 

Yes 

No 

? 

Do you prefer to associate with peo- 
ple who are younger than yourself? 

8. 

Yes 

No 

? 

Do you often feel just miserable? 

9. 

Yes 

No 

? 

Do you dislike finding your way 
about in strange places? 

10. 

Yes 

No 

? 

Are you easily discouraged when the 
opinions of others differ from your 
own? 

11. 

Yes 

No 

? 

Do you try to get your own way 
even if you have to fight for it? 

12. 

Yes 

No 

? 

Do you blush very often? 

13. 

Yes 

No 

? 

Do athletics interest you more than 
intellectual affairs? 

14. 

Yes 

No 

? 

Do you consider yourself a rather 
nervous person? 

15. 

Yes 

No 

? 

Do you usually object when a person 
steps in front of you in a line of 
people? 


This scale is one of the most recent of the adjustment 
type of tests, and makes use of considerable material 
demonstrated to be of value in the older scales. The 

* By Bemreuter, R. G. Published by Stanford University Press, Stan- 
ford, Calif. Sample questions quoted by permission. 



Verbal Tests for Emotions 


361 


scoring of the questions is somewhat more refined than 
that for the older scales. To each question weights have 
been assigned varying from plus 7 to minus 7 in accord- 
ance with the diagnostic value of the question. The in- 
ventory can be scored by four separate scales or scoring 
devices. By Scale 1 the final score indicates neurotic 
tendencies; by Scale 2, self-suflSciency; by Scale 3, intro- 
version-extroversion ; and by Scale 4, dominance-submis- 
sion. The correlation between Scale 1 and 3 scores is 
very high, however, so that little is gained by using both 
of these scorings, and the author recommends for general 
use that only three scorings be done. Standards are 
available on the test for high school students, college 
students, and adults. 

Upon what basis should we judge the usefulness of such 
measurements as those given by the adjustment scales? 
The various scales have usually been found to be reliable, 
giving approximately the same results when applied more 
than once at short intervals to the same group. Most of 
the reliability coefficients reported by the authors are 
close to .90. The validity of a scale of the type we have 
just discussed depends primarily upon the working out of 
the scoring device. The classification of answers for scor- 
ing as representative of adjustment or unadjustment, 
normality or abnormality, must be based upon studies 
conducted among trial groups which have been well se- 
lected for normality and abnormality. Herein lies a diffi- 
culty that is hard to overcome entirely in the construc- 
tion of such scales. For most of the scales studies have 
been reported in which the relationship of total scores as 
finally rated to neurotic trends or unadjustment trends is 
analyzed. Mathews ' reports correlations of .52 and .66 

® Mathews, E., “A Study of Emotional Stability in Children,” Journal 
of Delinquency, Vol. 8, 1923, pp. 1-40. 



362 


Verbal Tests for Emotions 


between scores on the Personal Data Sheet and composite 
ratings on “nervous instability” for groups of girls in a 
protectory. The ratings were made by four competent 
judges. Cady ® reports correlations averaging .40 between 
Personal Data Sheet ratings and estimates of incorrigibil- 
ity made on a group of 150 boys in institutions for delin- 
quents. 

The various scales for measuring emotional adjustment 
show fairly high agreement among themselves. While 
such an agreement is not proof of validity, it portrays a 
much more hopeful situation than would be true if the 
various scales designed for measuring the same traits 
showed marked disagreement. Bernreuter reports a cor- 
relation of .94 between his Inventory scored for neurotic 
traits and Thurstone’s Personality Schedule. When 
scored for introversion, his Inventory shows a correlation 
of over .75 with other tests for introversion. 

These various studies certainly suggest value in such 
measurements. The limitations of the scales seem to be 
primarily in their lack of fine discrimination, particularly 
at levels other than those of the definitely maladjusted. 
However, we might expect a more extended use of the 
scales to demonstrate their practical value in vocational 
guidance, clinical work, and other types of personality 
study. 


II. Extroversion-Introversion Scales 

Because of the importance of tendencies to extroversion 
or introversion in the affairs of life, a number of measur- 
ing devices have been worked out for indicating these ten- 
dencies. We have already noted that Bernreuter’s scale 

«Ca(ly, V. M., *The Estimation of Juvenile Incorrigibility/’ Journal of 
Delinquency Monographs, Vol. 2, 1923. 



Verbal Tests for Bmotions 


863 


can be scored for introversion. Other tests along this line 
include Laird’s Personal Inventory, C2 and C3; All- 
port’s Ascendance-Submission Scale; Neyman’s and Kohl- 
stedt’s Personal Traits Rating Scale. 

Extroverts are individuals who express their emotions 
in action and association with others. They are character- 
ized by sociability, interest in others, lack of worrying or 
moodiness, lack of self-consciousness, self-sufficiency, and 
fluency in talking. Introverts, on the other hand, are 
those whose emotions are expressed largely within them- 
selves. They are characterized, in contrast to the extro- 
verts, by lack of sociability, self-consciousness, enjoyment 
of more or less solitary types of recreation, day-dreaming, 
easily hurt feelings, more tendency to worries, and fewer 
more intimate friends. Tendencies in one or the other 
of these directions of reaction are noticeable within “nor- 
mal” groups of individuals. We can guess that such ten- 
dencies may be closely related to the type of job or voca- 
tion in which one can best succeed, and may have a great 
influence on one’s social adjustments in his group of friends 
and associates. Extremes in either direction of introver- 
sion or extroversion become abnormal, and there are defi- 
nite psychotic or insane conditions which typify the ex- 
tremes. Dementia praecox and involution melancholia 
are examples of extreme introversion among the insane; 
mania, of extreme extroversion. 

Laird, working at the Colgate University Psychological 
Laboratory, has done a considerable amount of work on 
the measurement of extroversion-introversion, and we 
may discuss his work as typical of measurement in this 
field. He has developed two scales for measurement, one 
for rating to be done by another individual, and one for 
self-rating. Both scales use the graphic rating technique. 
The first page of one of these scales is reproduced on this 



864 Verbal Tests for Emotions 

page/ For each question or trait rated, one end of the 
rating line indicates an introvert tendency and the other 
end indicates an extrovert tendency. The score is the 


Personal Inventory 

Knutt* 

C 2 


Dtrteliom' Dcactibe yourtelf bv aniwcring the ^ucstionf atkcii in the larire type Do thii by making 
a check mark through one of the half-inch hnei After reading over the phraiet in tmall tvpe afrer each quca- 
non, think back over your life for ikt past ftu moniki and determine nhere the check mark thould be made to 
deaenbe youracif accurately. Oeicribe your Mtrag* htkmof and tkoupktt. 

There ii no tune limit to thu Read each line entirely before making a check mark Describe your 
oeerafe thoughia and behavior for ike pen /rw mtnikt only. Check only one of the half-inch sections in answer 
to each question. 


I. How steadily have you 
worked at the 
taakeoftheday^ 
a. How have possible mit- 

fortunea entered into 
your thinking? 

3. How eauly have your 
feelings been hurt by re- 

marks or actions refemng 
to you’ 

4. How have you considered 
the feelings of others? 


i wi i sei Mwir iwn a^ir muimS ssa«‘s aSMics >• 






9. How have you acted and 
felt at lociai affairs? 

6. How well have you re- 

membered most of the 
errands and details of your 
daily routine’ 

7. In social converution how 

have you been’ 

8. How have you decided 
upon mattersoi daily con- 
duct? 


9. How have you generally 

been about making loans? <• mrsH 


C spflUSl IWI W D—M *.tJl»<.CUs 


First Page of Laird’s Personal Inventory, C 2 


number of checks occurring on the introvert end of the 
line, so that total score is representative of degree of in- 


■^From Personal Inventory, C 2, Reproduced by permission of the 
Hamilton Republic, Hamilton, N, Y. 








Verbal Tests for Emotions 


865 


troversion. Laird selected the questions for his scale 
from a large number of questions which he studied for 
their differentiating value in separating trial groups of 
definitely known extroverts from groups of definitely 
known introverts. 

Studies in colleges and industries utilizing the Laird 
test have shown fairly high correlations with other tests 
of introversion ; moderately high negative correlations be- 
tween the test scored for introversion and that scored for 
extroversion ; higher average introversion scores for women 
than for men ; very little relationship between introver- 
sion and general intelligence; higher average scholastic 
performance in school by introverts than by extroverts; 
slightly increasing introvert scores with age; no constant 
racial differences on the test; tendencies toward greater 
extroversion in groups of foremen and executives; and 
tendencies toward greater introversion in groups of office 
workers, clerks, stenographers, accountants, and research 
workers. 


Ill, An Annoyance Test® 

Cason has constructed a very interesting test in the field 
of emotional testing. His study and test concern the situ- 
ations and stimuli which produce certain feelings of un- 
pleasantness or annoyance, and the relations between the 
stimuli and the responses. In his discussion of the reasons 
for development of such a test Cason ® has emphasized 
the marked individual differences among people in the 
kinds of feelings they experience under the same external 


* Published by the C. H. Stoelting Company, Chicago, III Quotations 
from the test are made by permission. 

•Cason, H., “An Annoyance Test and Some Research Problems,” 
Journal oj Abnormal and Social Psychology, Vol. XXV, No. 2, July- 
Sept., 1930, pp. 224-236. 



866 


Verbal Tests for Emotions 


conditions; and the relationship of these individual dif- 
ferences to individual happiness, mental and emotional 
maladjustments and abnormalities, success and satisfac- 
tion in one’s vocation. 

The test itself consists of a list of 217 items (annoy- 
ances) to be rated according to the following scale: 

3 — ^Extremely annoying 
2 — Moderately annoying 
1 — Slightly annoying 
0 — ^Not annoying 

X — ^Have not been in the situation 

The first fifteen items are quoted as samples: 

( ) 1. A person behaving in an affected manner. 

( ) 2. A person with a gushing manner. 

( ) 3. A person losing his temper. 

( ) 4. A person habitually arguing. 

( ) 5. A person in an automobile I am driving telling 

me how to drive. 

( ) 6. To see a person who is driving an automobile 

take unnecessary chances. 

( ) 7. To see a boisterous person attracting attention 

to himself in public. 

( ) 8. To hear a person talking in an unnecessarily 

loud voice. 

( ) 9. A person continually trying to borrow some of 

my things. 

( ■ ) 10. To hear a person chewing gum loudly. 

( ) 11. A child not obeying his father or mother. 

( ) 12. A mother continually correcting her child in 
public. 

( ) 13. To see a person’s nose running. 

( ) 14. To see a person blow his nose without using a 
handkerchief. 

( ) 15. A person not covering his mouth when he 
coughs or sneezes. 



Verbal Tests for Emotions 


367 


Cason’s original material was collected from items which 
large numbers of people listed as things that annoyed 
them. Somewhat over 500 statements of annoyances 
were derived from this original material. These were 
printed and submitted for rating to several hundred sub- 
jects of all ages between 10 and 90 years. The final selec- 
tion of items was based upon the criteria of (1) frequency 
of the annoyance in everyday life, (2) age distribution of 
the people who were annoyed by the thing or activity, (3) 
objectivity, (4) universality, (5) permanence, and (6) 
psychological and social significance. Initial results re- 
ported for the test show slightly higher average annoyance 
scores for females than for males in all age groups; and 
annoyance scores slightly increasing with age up to 40 to 
60, then slightly decreasing up to 90. 

IV. The Pressey X-0 Test 

An ambitious attempt to work out a test of emotions 
was made some years ago by Pressey. He designated his 
test as a group scale for investigating the emotions. His 
scale emphasizes the measurement of abnormal mental 
attitudes and pathological emotional conditions, and has 
been foimd of use primarily in dealing with abnormal 
groups of various types. His test is in four parts. 

Part I is a test for discovering various types of un- 
pleasant feeling. The test consists of lists of words. 
This test aims through the association method to discover 
instructed to cross out every word which is unpleasant, 
and finally to select from each list the one word which 
is most unpleasant to him. The words are so chosen 
that by an analysis of his answers an individual may be 

Published by C. H. Stoelting Company, Chicago, 111. Quotations 
from the test are made by permission. 



868 


Verbal Tests for Emotions 


identified as to his particular type of fears. Each line 
contains a word belonging to each of four classes — dis- 
gust, fear, sex, and self-feeling — in addition to one neu- 
tral word termed a “joker,” to indicate whether the indi- 
vidual is following the instructions. The first four lists 
are as follows: 

1. disgust fear sex suspicion aunt. 

2. roar divorce dislike sidewalk wiggle. 

3. naked snicker wonder spit fight. 

4. failure home rotting snake hug. 

Part II is a modification of the free-association test, 
five words being contained in each list. The subject is 
pathological, abnormal, and criminological attitudes. 
The test consists of key words (given in capital letters) 
each of which is followed by five words which are to be 
crossed out if connected in the mind of the subject with 
the word in capital letters. The first four, as samples, 
are as follows: 

1. Blossom flame flower paralyzed red sew. 

2. Lamp poor headache match dogs light. 

3. Bath naked choke tree alone danger. 

4. King father baseball queen rights razor. 

Part III is an ethical discrimination test. It is 
adapted from an earlier test worked out by Pernald. 
The test consists of lists of words in which the subject 
is instructed to cross out everything which he thinks is 
wrong, or which he thinks that a person should be blamed 
for. Finally, he is instructed to encircle the one thing 
in each list which he thinks is the worst. The first four 
lists are as follows: 

1. begging swearing smoking flirting spitting. 

2. fear hate anger jealousy suspicion. 



Verbal Tests for Emotions 


369 


3. dullness weakness ignorance innocence meekness. 

4. careless fussy reckless silly childish. 

Part IV is a test for discovering anxiety tendencies. 
The subject is told to cross out all the things in each list 
about which he has ever worried, and finally to circle the 
thing in each list about which he has worried the most. 
The first four lists are as follows; 

1. injustice noise self-consciousness discouragement 
germs. 

2. clothes conscience heart-failure poison sleep. 

3. sickness enemies money blushing failure. 

4. falling queerness religion dizziness boss. 

As in Part I, the words in Part IV are arranged so as to 
make it possible to discover particular types of anxiety 
tendencies. Each list contains one word representing 
the following anxiety attitudes — ^paranoid, or suspicion; 
neurotic; self-conscious or shut-in; melancholic or self- 
accusatory; and hypochondriacal. In the first list, for 
example, the words in order represent these five types of 
anxiety. In the case of this test and also in the case of 
Part I, the particular pathological types can be recog- 
nized only in those cases where they are definite enough 
to color the answers throughout the test. The marking 
of a paranoid word in only one list would not be signifi- 
cant, but if the words throughout the test which are 
marked are consistently of the paranoid type, they might 
indicate considerable about the subject’s emotional at- 
titudes. 

Prom his emotions test, Pressey derives two general 
scores. One of these he terms the total emotionality 
score, which is the sum of all the words crossed out in the 
whole test. Presumably, the person who crosses out a 
large number of words is more easily aroused emotion- 



870 


Verbal Tests for Emotions 


ally, has more definite emotional attachments, and is 
perhaps more unstable emotionally than the person who 
crosses out only a few words. The second score which 
Pressey derives is an idiosyncrasy score, which indicates 
the extent to which a subject’s responses are those of the 
normal or average person. This score represents the 
total number of the circled words in each line which dif- 
fer from the modal choices of the average normal person. 

Several studies have been made utilizing the Pressey 
test. Reliability of the test as reported in the studies 
seems fairly satisfactory. McGeoch and Whitely, using 
college students as subjects, report reliability correlations 
for the four parts given 48 hours apart. All the coeffi- 
cients are over .80. With longer intervals between the 
testings, their correlations are somewhat lower. Flem- 
ming reports a reliability correlation of .97 for the total 
test scored for total emotionality, and .60 scored for idio- 
syncrasy, for college freshmen. 

As to the value of the test, the various studies which 
have been made leave us with no satisfactory conclusions. 
Our general impression is likely to be that the test meas- 
ures a composite of many factors and that a total emo- 
tionality or idiosyncrasy score means very little, if any- 
thing. In fact, Pressey himself frankly stated in making 
his scale that he had assembled a miscellaneous group of 
material for experimental study. His own statement is 
significant: 

The scores on the entire examination are the blurred 
result of a number of factors, and are of relatively little 
importance. However, it is possible, from the mass of 
data yielded by the examination, to combine certain 
items in such a way as to obtain, from the single exam- 
ination, highly differential information with reference 
to a number of problems.^^ 

Pressey, S. L., “A Group Scale for Investigatiitg the Emotions,’' 
Journal of Abnormal and Social Psychology, 16 : 66-64 (April 1621) . 



Verbal Tests for Emotions 


871 


Several studies of the test have been made on criminal 
as compared with non-criminal individuals. The various 
studies indicate that total scores on the test do not dis- 
tinguish criminal from non-criminal groups. However, 
criminals show marked deviations from the normal on 
separate elements of the test, and considerable informa- 
tion of value may be obtained by a detailed analysis of 
the responses. Chambers has analyzed the test for its 
possibilities of distinguishing the emotionally mature 
from the emotionally immature, and has found that some 
of the items constitute very good tests of emotional ma- 
turity. This same investigator also studied the test for 
its possibilities in the differentiation of college students 
as to achievement. He found certain items in the test 
which furnished reliable means of differentiating between 
those who were good students and those who were poor 
students. For example, he found in Part IV (things 
worried about) that the words “books,” “self-conscious,” 
“accidents,” “rivals,” “parties,” were marked significantly 
more often by good students, and the words, “work,” 
“failure,” “police,” “wrecks,” and “dreams,” were marked 
significantly more often by poor students. 

From the evidence that has been collected about the 
Pressey Test, we may conclude that it contains consider- 
able material which may be of value in diagnosing vari- 
ous emotional and conduct trends. However, the specific 
elements of the test which are of value in various prob- 
lems can be discovered only by a detailed study and 
analysis of the test on the groups in question. Total 
scores on the test seem to represent such a complex com- 
bination of various sorts of reactions and tendencies that 
they indicate very little of practical value. 



CHAPTER XXII 


The Measurement of Character and 
the Moral Sense 

F or purposes of discussion in this chapter we shall 
define character and morality in terms of the ex- 
tent to which a person adheres to the established codes 
of good conduct and of social and moral standards. We 
shall not inquire to what extent such conduct depends 
upon innate qualities or upon acquired reaction tenden- 
cies, There is the hypothesis, adhered to by some, that 
individuals differ inherently in their sensitiveness to 
moral distinctions and in their disposition to subject 
themselves to codes of conduct. On the other hand, we 
may take the view that, aside from a minimum of intel- 
ligence as a basis for learning, all morality, character, and 
conduct conformities are the product of experience, en- 
vironment, or training. 

Reliable and valid conduct and character measure- 
ments should have a wide field of usefulness. In the 
field of crime they should give us a yardstick for meas- 
uring an important element in the makeup of the delin- 
quent. We cannot expect society to solve the problem 
of crime without methods for studying the criminal. 
From investigations already made, it seems fairly con- 
clusive that mental ability and achievement measures 

S72 



The Measurement of Character 


378 


alone are inadequate. Perhaps we can expect a valuable 
addition to our instruments of measurement in well 
worked out character tests. In respect to insanity, char- 
acter testing may help us to recognize early signs of 
breakdown in behavior, to arrive at better judgments as 
to degree or seriousness of the disorder, or to make a 
better prognosis as to chances of recovery. Measure- 
ment of vocational competence could well include con- 
duct and character tests. Up to the present, vocational 
measurement has paid most attention to measures of 
ability, which, to be sure, are of prime importance in the 
selection of a worker. But the worker must not only 
possess ability to do the work at hand ; he must also have 
certain character qualities of trustworthiness, regularity, 
and the like, to be of great value to his organization. 
Another field m which character measurement is needed 
is in the schoms in connection with character education, 
social development, and teaching of good citizenship. 
Our modern schools are not interested alone in academic 
attainments. They are interested in certain other edu- 
cational outcomes reflected in better character and 
greater social values. These outcomes should not be 
neglected in the school measurement program. 

Two general methods have been used in measuring 
character and morality, and two different types of tests 
have been constructed. The one type of test is a be- 
havior test. It involves actually placing the subject to 
be tested in a situation in which he has a choice of be- 
havior, in which he may react with good action or bad 
action, as with honesty or dishonesty. The individual’s 
response in the situation is the basis for grading him. 
Such a method of measurement obviously has the dis- 
advantages of cumbersomeness in testing and often of 
artificiality in the test situations. The second method 



874 The Measurement of Character 

consists in the devising of tests of conduct knowledge and 
judgment. 

The all-important question to be answered before we 
use such instruments of measurement concerns the rela- 
tionship between knowledge and conduct. Two authori- 
ties on psychological testing have stated the problem as 
follows: 

Very little experience of life is necessary to make one 
aware that a person may give lip service to a moral 
principle, but may repeatedly violate it in his conduct. 
One's ability to state a principle, or to pass a correct 
judgment ^as to what should be done in a specific situa- 
tion, therefore, is no guaranty that he would act in ac- 
cordance with the right principle. It is difficult to set 
up a moral test, however, in which the individual shall 
be put in a situation which demands a moral choice. 
Because a test which demands verbal judgment is so 
much easier to administer and to devise, we may inquire 
further whether such a test may not have some value. 
Upon further consideration it appears that while a test 
which demands verbal response does not guarantee what 
the conduct of an individual shall be, it does give some 
information on the negative side of the case. If a person 
shows that he does not recognize a moral principle, we 
may be reasonably certain that he will not act in ac- 
cordance with it. A merely verbal test may, after all, 
then, be of some value.' 

It seems only natural that men should bave turned to 
tests of knowledge and reasoning for the diagnosis of 
conduct. Even the popular belief that action follows 
knowledge, that we reason out beforehand the course of 
action we are to pursue, gives warrant enough to investi- 
gators to experiment with tests of this type. This belief 
permeates our institutions, our educational theories and 
practices, and indeed all our relations with our fellows. 
Courts of criminal law decide responsibility for an act 


1 Freeman, Frank N., Mental Tests, Houghton Mifflin Co., Boston, 
1926, p. 214. 



The Measurement of Character 375 

on the basis of the defendant’s ability to discriminate 
between right and wrong.* 

The first set of tests of the behavior type was devised 
and used by Voelker.* There were two series of ten tests 
each. His studies of the test included validity studies 
based upon correlations between test results and teacher 
estimates of trustworthiness of students; and studies of 
effects of several weeks’ instruction in trustworthiness 
upon scores. His early results were encouraging if not 
conclusive. The pioneer work in character tests of the 
knowledge and judgment type is represented in Fernald’s 
Ethical Discrimination Test* in which individuals were 
required to rank misdeeds in order of their gravity. 

The best and most extensive investigations of char- 
acter measurement published to date are those of the 
Character Education Inquiry conducted by Hartshorne 
and May and their collaborators. We shall briefly ex- 
amine their tests as samples in this field. For a complete 
consideration of the tests the reader is referred to the 
original reports.® 

I. The Tests of Conduct Knowledge and Judgment 

The Character Education Inquiry made use of twelve 
tests of the knowledge and judgment t 3 q)e. These are 
as indicated on the following page. 

^Symonds, Percival M., Diagnosing Personality and Conduct, The 
Century Co., New York, 1931, p. 260. 

® Voelker, Paul F., The Function of Ideals and Attitudes in Social Edur 
cation, Teachers College, Contributions to Education, No. 112, New 
York, 1921. 

♦Femald, G. G., ^The Defective Delinquent Class Differentiating 
Tests,” American Journal of Insanity, Vol. 68, 1912, pp. 524-694. 

^ Hartshorne, H., and May, M. A., ‘‘Testing the Knowledge of Right 
and Wrong,” The Religious Education Assocuition Monographs, No. 1, 
1927; Hartshorne, H., and May, M. A., Studies in Deceit, The Macmillan 
Co., New York, 1928; and Hartshorne, H., May, M. A., and Shuttleworth, 
F. K., Studies in the Organization of Character, the Macmillan Co., New 
York, 1930. 



876 


The Measurement of Character 


1. An Opposites Test, a word knowledge test related 
to character and conduct matters. 

2. A Similarities or Cross-Out Test (similar in pur- 
pose to No. 1). 

3. An Ethical-Social Vocabulary Test. 

4. A Word Consequences Test, in which the subject 
indicates, for an action given, all likely conse- 
quences, the most likely consequence, the best 
consequence, and the worst consequence. 

5. A Cause and Effect Test, in true-false form, of 
which the following are samples: 

1. Good marks are chiefly a matter of luck True False 

2. Ministers’ sons and deacons’ daughters 

usually go wrong True False 

6. A Duties Test. The pupil marked an item Yes if 
he considered it his duty, No if he considered it not 
to be his duty, and f if sometimes his duty and 
sometimes not. Samples are: 

1. To help a slow or dull child with his les- 


sons Yes f No 

2. To read the newspapers every day. ... Yes f No 


7. Comprehension Test, a multiple-choice type that 
tests “what one should do.” Samples are: 

1. If someone asks to borrow your pencil, 

Tell him it’s broken 

Tell him that you just lost it 

Tell him that you don’t want to lend it 

Let him take it 

2. If someone steals your lunch 

Steal another lunch to even it up 

Report it to the teacher 



The Measurement of Character 877 

Cry about it 

Say nothing about it 

8. Provocations Test. This test represents an at- 
tempt to determine the extent to which judgment 
has the ascendancy over wishes, prejudices, and 
emotions. R encircled as an answer indicates the 
person taking the test thinks the action exactly 
right, Wr absolutely wrong, and Ex wrong but ex- 
cusable under the circumstances. Samples are: 

1. Helen noticed that nearly everyone in 
the class was cheating on a test, so she 

cheated too R Ex Wr 

2. Harry was a Christian boy. One day 
a Jewish boy called Harry “a dirty 
Christian/^ Harry knocked him down. R Ex TFr 

9. A Foresight Test. Consequences of given situa- 
tions are to be checked as ‘‘likely to happen,^' 
“might happen but not likely,’^ or “would not 
happen,^’ A sample is: 

1. John accidentally broke a street lamp with a snow- 
ball. (Consequences to be checked in columns of 
squares labeled as above) . 

a. John was arrested and sentenced to six months in 
jail. 

b. John said nothing about it, and people thought 
another boy had done it. 

c. The emergency wagon had to come and repair it. 

d. He thought it was such fun that he smashed a lot 
more lamps. 

e. There was an accident there because it was dark. 

/. Some people were cross about it, and John^s father 

got into trouble. 

g. The glass went on the street and a child cut his 
hands on it. 

h. The city had to pay for the lamp. 



378 


The Measurement of Character 


10. Recognitions Test. A list of acts must be classi- 
fied as Cheating (C); Lying (L); Stealing (S); 
Wrong, but not cheating, lying, or stealing (X) ; or 
Not wrong at all (/). Samples are: 


1. Bullying younger children C L S X J 

2. Using street-car transfers that are 

out of date C L S X J 


11. Applications Test. This is a test of ability to 
apply principles. A sample will make its nature 
clearer : 

1. Mary saw Helen cheating on an examination. She 
had to decide whether she would 
( ) (a) Report it to the teacher. 

( ) (6) Not report it to the teacher. 

Here arc the five rules, of which two apply to this 
problem. Check two and only two in the spaces at the 
left of the numbers. 

( ) (1) Treat others as you would like to have them 

treat you. 

( ) (2) Be true to what is for the good of all, even 

when your own interests or those of your 
friends are involved. 

( ) (3) When you have wronged some one, ask to be 
forgiven. 

( ) (4) Be cheerful and uncomplaining when disap- 

pointed or hurt or in trouble. 

( ) (5) Do not think of yourself as more important 
than you are. 

After checking the two rules that apply to Mary, put 
a check before either (a) or (b) , according as you think 
it would have been right for her to tell or not to tell. 

12. A Test of Good Manners. This test constructed 
by Miss C, I. Orr for the Character Education In- 
quiry measures knowledge of current standards of 



The Measurement of Character 


879 


courtesy and good manners. Items are of the 

True-False, multiple-choice, and Yes-No forms. 

Samples are: 

1. In helping yourself to sugar always use 

your own spoon. ... True False 

2. When yawning, make no attempt to 

suppress it by covering the mouth. . . . True False 

How should we evaluate these tests? Are they really 
good tests of character, morality, and conduct? Should 
we recommend them for use in practical situations and 
put a great deal of dependence on the results of their ad- 
ministration? Hartshorne and May attempted to evalu- 
ate their tests on the basis of (o) reliability; (5) inter- 
correlations of the various tests, or relation with each 
other; (c) correlations of separate tests with a composite 
of several tests; (d) correlations of the tests with intel- 
ligence; (e) correlations of the tests with age; and (/) 
correlations of the tests with actual conduct (cheating). 
The tests in general possess satisfactory reliabilities, their 
reliabilities being only slightly less than one would expect 
for school achievement tests in similar form. So far as 
consistency of results is concerned, they can, therefore, 
be depended upon. The intercorrelations are mostly 
positive and of moderate degree, indicating some common 
factor running through the various tests. The correla- 
tions between separate parts and a composite of several 
of the tests are generally between .30 and .60. Perform- 
ances on the tests among school children are generally 
positively related to intelligence and negatively related 
to age. Unfortimately, from the standpoint of definite 
recommendation of the tests for practical problems, they 
do not show high relationships with actual conduct, al- 
though the two are positively correlated. Perhaps we 



880 


The Measurement of Character 


should, however, demand further investigations along 
this line, since the studies have been limited and in many 
cases based upon rather inaccurate subjective estimates 
of conduct. Symonds’ summary in discussing these tests 
seems a pertinent close to our consideration of them : 

It is possible to measure knowledge and judgment 
with reference to conduct through the application of sev- 
eral useful tests which have been constructed for measur- 
ing health, Biblical knowledge, ethical knowledge, etc. 
These tests have very satisfactory reliability, comparing 
favorably with tests in the school subjects, similarly 
constructed. They correlate somewhat with each other 
and substantially with intelligence in general. The cor- 
relations with conduct are very low, so that with the less 
perfect tests they seem to fail to differentiate between 
normal and delinquent individuals. Conclusions from 
the research work done with these tests seem to indicate 
that answers reflect the code of the group in which the 
individual happens to be rather than any reasoned solu- 
tion to the problem situation presented for judgment. 
These codes seem to be group affairs, and there is a dis- 
tinct correlation between conduct and knowledge when 
groups as a whole are considered. The low correlations 
of knowledge and conduct for individuals indicate how 
distinct the two forms of activity are. On the other 
hand, when these correlations are compared with the 
correlations between different forms of conduct, there 
is ground for the suggestion that perhaps knowledge and 
judgment of conduct constitute after all the one force, 
however ineffective, that works toward integrating con- 
duct.® 


II. Behavior Tests 

The Character Education Inquiry tests of honesty and 
trustworthiness will be described as examples of the be- 

® Symonds, Percival M., Diagnosing Personality and Conduct, The 
Century Co., New York, 1931, p. 294. 



The Measurement of Character 381 

havior or performance tests in character measurement. 
The battery included nine types of tests, as outlined 
below. 

1. Testing dishonesty by the double testing technique. 

The method consists in giving two forms of a test equated 
in difficulty, one form being given under conditions which 
do, and the other under conditions which do not, permit 
cheating. Variations in score exceeding the normal vari- 
ation to be expected are considered evidence of dis- 
honesty. The Character Education Inquiry made use 
of four tests in this group — an arithmetic test, a sentence- 
completion test, an information test, and a word-knowl- 
edge test. Cheating on one set of each was made pos- 
sible by allowing the pupil to score his own paper by an 
answer key. 

2. Speed tests. The technique is similar to that just 
described. In these tests, however, performance is de- 
termined by speed rather than power. The subject can 
cheat by adding extra items to the answers when allowed 
to score his own paper. Hartshorne and May utilized 
speed tests in addition, number-checking, letter cancel- 
lation, digit cancellation, digit-symbol substitution, and 
dot-in-square placement. 

3. Coordination tests. These included tracing tests, 
such as tracing around squares, keeping within a narrow 
margin; and a circles test, as illustrated in Fig. 22.^ 
Under the test conditions, performance must be done 
with the eyes closed. Cheating or dishonesty was evi- 
denced by achievements beyond the limits of probability 
without “peeping.” The limits of probability were de- 
termined by trials on control groups. The method of 


^ Reproduced by permission of the Macmillan Company, New York. 



882 


The Measurement of Character 


these tests is known as the improbable achievement 
technique. 

4. Puzzle tests. The puzzle tests utilized in the 
Hartshorne-May studies were also based on the improb- 
able achievement technique. The puzzles used, while 
appearing simple, could not be solved without cheating. 



Pig. 22.— Circles Test for Dishonesty. {Directions.-— Wait for the 
signal for each trial. Put the point of your pencil on the cross at the 
foot of the oval. Then, when the signal is given, shut your eyes and 
put the figure 1 in each circle, taking the circles in order. For the 
second trial, put the figure 2 in each circle. For the third trial, the 
figure 3, etc. After each trial put a check mark in the score box under 
the number of each circle you succeeded in striking, count the checks, 
and enter the total in the column headed T at the right of the score 
box. After the last trial add up Column T. This is your score. The 
maximum score is 50.) 


The Measurement of Character 


388 


Correct solutions, beyond certain limits allowable by 
chance, indicated dishonesty. 

5. Lying tests. Two types of tests to discover lying 
were utilized in the studies we have been discussing. 
One test consisted of questions regarding tests taken 
previously (those described under No. 1). The ques- 
tions asked directly whether the subject had cheated. 
Without the subject’s knowledge, the truth of his answers 
could be checked. This test was considered to measure 
lying to avoid disapproval. Another type of test was 
used to measure flying to gain approval. This consisted 
of a number of questions to»be answered Yes or No, 
which, althougl^ they have widespread social approval, 
are rarely actuajlly done. Sample questions are: 

Do you usually report the*humber of a car 
you see sj^eding? ^ Fes No 

Do you always obey your parents cheerful^ 
and promptly? ^ . . . i . Yes No 

Do you read thk£ible every d«(^? . . . . Yes No 

A critical point, separajiifg honesty from lying, was 
established by having adult graduate students answer the 
questions so that these truthfully repres|^nted their child- 
hood. Children who received scores considerably above 
this critical point were judged to have lied to gain ap- 
proval. 

6. Homework tests of cheating. A form of the word- 
knowledge test described undec-(T) was given to be done 
at home. Instructiona-against obtaining help from the 
dictionary or another person were given twice. Scores 
considerably above the performance on the test in the 
classroom were considered evidence of cheating or dis- 
honesty. 



884 The Measurement of Character 

7. Athletic contests. Deceptive behavior was meas- 
ured in these tests by the subject’s inclination to fake a 
good record. The physical tests used were the dyna- 
mometer test, the spirometer test, the chinning or pull- 
up test, and the standing broad jump. Certain prizes 
were offered for good performances. Each child was al- 
lowed three trials, the best out of the three being noted 
by an examiner without the child’s knowledge. The 
child was then encouraged to make five additional trials 
immediately after the three without any supervision, and 
to report them later to the examiner. Better records 
than in the first three are highly improbable, because 
practice effect is practically absent in the tasks used, and 
fatigue effect is considerable. Gains stated by the child 
above a normal variation indicated deception. 

8. Party games tests. Cheating was observed in per- 
formances in such games as “pinning the tail on a 
donkey” and “bean relay race.” The game situations 
were so arranged that cheating could occur and could be 
checked without the subject’s knowledge. 

9. Money tests. Situations were worked out in which 
was given the opportunity to take money, dishonestly, 
without apparent danger of detection. Systems of 
checking made the detection of dishonesty possible. 

How do these various direct-method conduct measures 
stand the tests of a good measuring device? The studies 
by Hartshorne and May indicate that the tests are reli- 
able as judged by correlations between original tests and 
retests. The tests, however, raise grave doubts regarding 
their general value on two counts. They have not been 
demonstrated to correlate to any close degree with gen- 
eral behavior; and they seem to test only very specific 
situations, or conduct only under very specific circum- 
stances. The studies of the tests indicate that a single 



The Measurement of Character 


885 


test of deceit or a single test of honesty bears little rela- 
tionship to deceit or honesty in general. The reader will 
probably be impressed by Symond's excellent summary 
of the value of such tests of conduct: 

From a review of all the skillful and ingenious methods 
for testing conduct directly that have been devised, the 
conclusion stands out above all others that conduct is 
very specific. When exactly the same test is repeated, 
the correlation is fairly high, perhaps around .70 or .80. 
But when the situation is changed ever so slightly, the 
correlation between the two similar tests drops, and long 
before the two situations seem different enough to be 
called by different names, the correlation has dropped 
close to zero. A battery of tests designed to test such 
a trait as persistence, or aggressiveness, or speed of deci- 
sion gives results so varying and with so little consistence 
as to furnish little warrant for assuming the presence of 
such a trait. 

These low intercorrelations also help to explain the low 
correlations of these tests with outside criteria. Since the 
tests are so specific as to fail to correlate with tests bear- 
ing the same name, naturally they could not be expected 
to correlate to any degree with other factors which are 
admittedly dissimilar. 

The conclusions that one draws from these results are 
not very encouraging. There are four possible things 
that may be done with performance tests in the measure- 
ment of conduct: 

1. They may be discarded as being so specific as to be 
useless for all practical purposes. 

2. Tests may be devised that apply to the specific 
situation in which they will be used. Since tests are so 
very specific, test situations must be set up which ap- 
proximate as closely as possible the situations in office, 
industry, school, or institution where they will be used. 

3. Since no one test measures a given quality ade- 
quately, a variety of tests representing a range of situa- 
tions in which the trait occurs may be devised so that the 



886 


The Measurement of Character 


composite will be a satisfactory measure of the trait in 
question. . . . 

4. A fourth method of using these tests is to pick 
out one test for each of a number of different traits and 
weigh these tests in combination in order best to predict 
success in business or school. . . . 

In all of these alternative methods of using perform- 
ance tests certain practical problems of cost, diflBculty 
of administration, difficulty in applying statistical tech- 
niques, and the like, arise which very definitely limit the 
use of these tests. It is often expensive to test in the 
practical situation. Certain ingenious devices must be 
applied which eat into time and money, as May and 
Hartshorne found out. Again, to give a well-rounded 
battery of tests is also expensive. Finally, one who 
plans to use the regression equation technique must count 
the cost beforehand. 

Performance tests have a real and valuable place at 
the present time in experimental work. As used by 
Hartshorne and May, they have revealed facts that were 
obtainable in no other way. But there must be consid- 
erable further development before tests of this type be- 
come a feasible tool in clinical work.® 

III. Other Types of Tests 

Although the tests we have described represent the 
best and most useful of the tests that have been devised 
for measuring character, they do not constitute a com- 
plete list of types of character measures. The Commis- 
sion on Character Education of the National Education 
Association lists eight types of tests and measures that 
have been used in character measurement.® These are 
(1) Measures based on physical factors; (2) Measures 
based on significant knowledge; (3) Measures of opinion; 


* Symonds, Percival M., op, ciL, pp. 352-354. 

• Character Education, Tenth Yearbook, Department of Superintend- 
ence, National Education Afieociation, Washington, D. C., 1932. 



The Measurement of Character 


887 


(4) Self-descriptive measures; (5) Disguised measures; 
(6) Measures of conduct in controlled situations; (7) 
Significant facts as character measures; and (8) Reputa- 
tion measures. The second and sixth of these designate 
the types of tests we have described from the Character 
Education Inquiry tests. The fourth and fifth represent 
adaptations of personality tests, such as are described in 
Chapter XXI, to the study of character traits. The last 
two, if more than rough, uncontrolled personal estimates, 
represent the application of rating-scale technique to 
evaluating character and reputation. 

Measures based on physical factors, as discussed by the 
Commission on Character Education, include (a) body 
type, (b) appearance, (c) motor reactions, as handwrit- 
ing, (d) biochemical tests, and (e) physiological changes 
during emotion. The first three of these are generally 
ancient attempts to tell character which have not stood 
the tests of validity, and are now relegated to the pseudo- 
scientific. Biochemical methods offer hope, and are cer- 
tainly worthy of investigation. For the most part, at 
present, the chemical processes of the body are not well 
enough understood and easily enough measured to be 
useful psychologically. We know that endocrine secre- 
tions do influence basic attitudes and conduct, although 
no specific form of immoral or antisocial conduct is trace- 
able to any particular gland. We know, also, that certain 
infections, toxins, and drugs are clear-cut causes of per- 
sonality disorders. Fairly recent studies suggest rela- 
tionships between personality and character differences 
and such biochemical factors as salivary alkalinity, uri- 
nary acidity, creatinine in urine, blood phosphorus and 
blood calcium. Further research may eventually prove 
these biochemical tests to be of practical value in judg- 
ing phases of character. 



388 


The Measurement of Character 


Physiological changes during emotion have been uti- 
lized to indicate the emotional effect of confusion and 
hence to detect attempts to deceive or lie. The physio- 
logical changes studied include blood pressure, breathing, 
reaction time, and the psychogalvanic reflex. These are 
discussed and evaluated in Chapter XXVI. 

Measures of opinion and attitude, particularly toward 
matters of social welfare, law abidance, and morality, 
should certainly have a bearing upon character testing. 
One’s opinion or attitude is very likely to color one’s 
conduct; in many instances it is likely to be the most 
important motivating force in conduct. Thurstone of 
the University of Chicago has embarked on a program 
• of working out and statistically evaluating a considerable 
number of attitude scales. Scales published or in prepa- 
ration include, among a number of others, scales for 
measuring attitude toward God, the church, birth control, 
movies, law, Sunday observance, criminals, capital pun- 
ishment, and freedom of speech. Each scale consists of 
a list of statements of the “pro” and “con” point of view, 
with reference to the subject of the scale. The person 
filling out the scale checks those statements with which 
he agrees. Each statement has a scale value indicating 
the extent to which the statement is favorable toward the 
subject of the scale. Attitude or opinion is indicated by 
the sum of scale values of those statements checked. 
For the method of evaluating the statements of the 
scales, the reader is referred to the original writings. 
The scope of this book does not allow an elaboration on 
this.^® 

The validity of attitude and opinion scales is directly 
related to factors which insure an honest expression in 

Thurstone, L. L., and Chave, E. J., The Measurement of Attitude, 
University of Chicago Press, Chicago, 111., 1929. 



The Measurement of Character 


389 


the subject's answers. Opinion tests are of most value 
when there is an important and genuine motive leading 
the persons being tested to give a true picture of their 
opinions. The procedure of giving the tests, including 
the manner of introducing them and explaining them, is 
as important in affecting the worth of the test as what 
is included in the test items themselves. A discussion 
by Thurstone is interesting in this connection: 

There comes to mind the uncertainty of using an opin- 
ion as an index of attitude. The man may be a liar. 

If he is not intentionally misrepresenting his real attitude 
on a disputed question, he may nevertheless modify the 
expression of it for reasons of courtesy, expecially in 
those situations in which frank expression of attitude 
may not be well received. This has led to the suggestion 
that a man’s action is a safer index of his attitude than 
what he says. But his actions may also be distortions 
of his attitude. A politician extends friendship and 
hospitality in overt action while hiding an attitude that 
he expresses more truthfully to an intimate friend. 
Neither his opinions nor his overt acts constitute in any 
sense an infallible guide to the subjective inclinations and 
preferences that constitute his attitude. Therefore, we 
must remain content to use opinions, or other forms of 
action, merely as indices of attitude. It must be recog- 
nized that there is a discrepancy, some error of measure- 
ment as it were, between the opinion or overt action that 
we use as an index and the attitude that we infer from 
such an index.^^ 

Thurstone, L, L., “Attitudes Can Be Measured , American Journal 
of Sociology, Vol. XXXIII, No. 4. 




Part VII 

PHYSIOLOGICAL MEASUREMENTS 
IN PSYCHOLOGY 




CHAPTER XXIII 


Measurement of Fatigue 


A lmost every example of sustained production or 
performance involves the problem of fatigue. The 
applied psychologist has met the problem in the work- 
shop, where fatigue has been the factor accounting for 
decreased production, occurrence of accidents, or adverse 
working attitude; in the office, where fatigue has limited 
accuracy of performance or has called for unnecessary 
energy expenditure to maintain work at the required ef- 
ficiency; and in the classroom, where fatigue has influ- 
enced learning ability. More recently, fatigue studies of 
automobile drivers have been directed toward solving 
the problem of street and highway accident prevention. 
Finally, fatigue studies have interested the clinical psy- 
chologist because they throw light upon the basic causes 
of neurasthenic or similar mental conditions. 

All of these studies have emphasized the need for 
quantitative measures of fatigue. We can expect rela- 
tively little to be accomplished in increasing production 
by eliminating causes of fatigue until quantitative 
studies of fatigue can be made. Little can be learned 
about the relationship between accidents and fatigue un- 
til there are available quantitative indications of degree 
of fatigue. It is difficult to decide upon the importance 
of fatigue as a factor in mental disorders and disease 
without means of indicating the amount of fatigue that 
has accumulated. 


393 



394 


The Measurement of Fatigue. 


The means of measuring fatigue which we find in psy- 
chological literature can generally be grouped under 
three methods. The earliest psychological methods 
measured fatigue in terms of decreased production or de- 
creased performance. The various studies depicting “fa- 
tigue curves” throughout a day of work or a day of 
activity are of this type. These measurements err in as- 
suming that production is inversely proportional to 
fatigue. In many instances this is not true. Other 
mental or emotional factors often interfere with produc- 
tion when actual fatigue as measured by more accurate 
methods is slight. Such measurements are likely to con- 
tribute very little to the solving of the problems in which 
fatigue is a factor. 

A second method of measuring fatigue, which is met 
in a few of the psychological studies, is subjective; fa- 
tigue is measured by a subjective feeling of tiredness, re- 
corded by the subjects being studied. Such a method of 
measuring fatigue, in the first place, is open to all the 
objections pertaining to subjective measures in general, 
and in addition it is open to the objections just men- 
tioned in connection with measurements in terms of pro- 
duction. Subjective feelings of tiredness are not by any 
means perfectly correlated with actual fatigue. They 
are not even always related to amount of production. 
Poffenberger demonstrated the lack of correlation be- 
tween feelings of tiredness and output in an experiment 
in which twelve subjects did mental work continuously 
for five and one-half hours, indicating at intervals the 
quality of their feelings on a scale ranging from “ex- 
tremely good” to “extremely tired.” At these same in- 
tervals records were taken of their production. The 
relationship between the two is shown in Fig. 23. 

The third type of measurement of fatigue is one in 



The Measurement of Fatigue 895 

terms of physiological state of the individual. This 
method seems to possess many advantages over the other 
two, in that it can be made objective, it is directly re- 
lated to the nature of the fatigue itself, and it is inde- 
pendent of the production or accomplishment which 
usually constitutes the problem for which fatigue is 
studied. 

Further back than we can find written records, certain 
physiological effects of fatigue have been noted. Such 



Fif. 23.— The Relationship Between Production and Feelings of 

Tiredness. 


obvious manifestations of strenuous physical exercise as 
increased breathing, increased heartbeat, and increased 
perspiration could not fail to be noticed. The exercise, 
however, has to be rather strenuous for such changes to 
be observable without more exact instrumental measure- 
ment. Present problems demand quantitative measure- 
ments and often the detection of changes produced under 
circumstances of rather mild exercise. 

Physiological studies of the blood under various condi- 
tions led, long ago, to a theory of fatigue as a condition 
produced essentially by an accumulation of fatigue prod- 
ucts in the blood,, such an explanation being credited by 
experiments demonstrating that a rested animal can be 




896 


The Measurement of Fatigue 


fatigued by injecting into its blood stream the blood from 
a fatigued animal. Fatigue is to be looked upon as a 
general body condition in which observation of the fol- 
lowing may be stressed : the muscles, as the primary seat 
of energy transformations; the blood, as a vehicle of en- 
ergy supplies and waste removal; the heart and blood 
vessels, as the distributing mechanism; the respiratory, 
alimentary, and excretory systems, as sources of supply 
and means of waste removal; and the endocrine glands 
and nervous system, as the coordinating and regulating 
mechanisms. Quantitative measurements of fatigue of 
the physiological type are based upon changes in these 
parts of the body. Not all the changes can be measured 
with our present devices, and not all the measurements 
that can be made are practical for ordinary psychological 
application, because of their complexity and technical 
nature or, in some instances, because of the discomfort 
which their application may cause the subject being 
measured. 

In the remainder of this chapter are indicated those 
means of measuring fatigue which have proved of most 
value and are most practicable. The discussion is based 
upon the results of an experimental investigation con- 
ducted by the Psychology Department at the George 
Washington University.^ The study aimed primarily to 
demonstrate the feasibility of utilizing certain physiolog- 
ical measurements in denoting the amount of fatigue 
produced in an individual. The subjects used in the 
study were university men between the ages of 18 and 
35. '‘Normal” physiological measurements were made 
on each subject before the experiment. The subjects 

1 Moss, F. A., Roe, J. H., Hunter, 0. B., French, L., and Hunt, T., 
Measurement of Fatigue by Physiological Methods,” Journal of Expert^ 
menial Psychology, Vol. XIV, No. 4, Aug. 1931. 



The Measurement of Fatigue 897 

then engaged in a fatigue-producing occupation, and the 
same physiological measurements were made after fatigue 
had set in, in some instances the measurements being 
taken at varying stages of the fatigue. 

The method of producing fatigue consisted in riding a 
bicycle ergometer. This is a machine originally used by 
Benedict and described by him as follows: 

The rear wheel of a bicycle was replaced by a copper 
disk 40.5 centimeters in diameter and 6 millimeters 
thick. This disk is mounted in such a way that it ro- 
tates freely on a ball-bearing axle. A small sprocket 
wheel is attached to the axle and is in turn connected 
in the usual manner with the large pedal sprocket wheel 
by means of a sprocket chain. A wooden frame sur- 
rounds the periphery of the disk, and to the upper part 
of the frame is attached an electro-magnet. Binding 
posts are attached to the magnet to connect with the 
electric cable leading to the observer’s table, where 
strength of current can be regulated with great accu- 
racy. The field of the magnet is so extended that the 
copper disk rotates in the center of the field with but 
a very small air gap between the surface of the disk and 
the surface of the magnet, and hence the resistance is 
wholly that of magnetic induction. A current of 1.25 
amperes induces large eddy currents in the copper disk 
to such an extent that the resistance is very noticeable.* 

By varying the strength of the magnetic field, regu- 
lated through a rheostat and ammeter, the amount of 
work performed in pedalling can be varied. The number 
of revolutions made is measured by an electric counter 
connected with the sprocket wheel. The ergometer has 
been calibrated in terms of calories, the work performed 
per revolution being .0231 with a current of 1.26 am- 

* Benedict, Francis G., and Carpenter, Thome M., The Influence cf 
MuscuIot and Mental Work on Metabolism , United States Department 
of Agriculture, Bulletin No. 208, 1909, Washington, D. C. 



898 The Measurement of Fatigue 

peres and 70 revolutions per minute, this being the speed 
and amperage used in the present experiment. During 
the experiment, each subject rode the bicycle for 15 or 
20 minutes, steadily, at the rate indicated. 

1. Blood pressure. Blood pressure is related to the 
force with which the circulatory system is supplying 
blood to the various parts of the body. Since activity, 
exercise, and fatigue put extra demands upon the circu- 



Exercise minutes 


Fig. 24. — Blood Pressure Curves in Fatigue (Three Subjects). 

latory system, we might expect blood pressure to be pro- 
portional to the demands, or in other words, to degree 
of fatigue. Blood pressure measurements in the experi- 
ment were made using an ordinary sphygmomanometer. 
During 20 minutes’ exercise on the bicycle, blood pres- 
sure curves were obtained like those shown in Fig. 24. 
Up to about 8 or 10 minutes of exercise, the subjects 
showed an increasing blood pressure, with increasing ex- 
ercise. Beyond this point, blood pressure decreased un- 
til at the end of 20 minutes, for many of the subjects, 
it had approached the initial pre-exercise level. The 




The Measurement of Fatigue 899 

blood pressure curves generally show a rapid rise to a 
maximum height, then a gradual fall. Onset of fatigue 
seemed to follow the downward slope of the curve. The 
more enduring subjects showed maximum height later. 

2. Carbon-dioxide combining power of the blood. 
Under normal conditions the human blood contains a 
reserve supply of fixed alkalies or bases. It is this “al- 
kali reserve” which keeps the hydrogen ion concentration, 
or true reaction of the blood, slightly alkaline. In proc- 
esses of metabolism in the body an excess of acid radicles, 
from such things as the combustion of carbon substances, 
is continuously produced. During exercise there is a 
marked increase in these acid radicles, though in normal 
subjects they are not sufficient to make the reaction of 
the blood acid. This condition, however, can be meas- 
ured by the carbon-dioxide combining power of the 
blood, which is lowered by depletion of the supply of 
fixed bases. Since the fixed bases of the blood constitute 
the chief means of transporting carbon dioxide from the 
tissues to the lungs, depletion of the supply of fixed bases 
reduces the capacity of the blood to carry carbon dioxide. 
This condition so produced leads to an accumulation of 
carbon dicpcide in the tissues and consequent blocking of 
the processes of oxidation, factors important in produc- 
ing the condition of fatigue. 

3. Blood sugar. Blood sugar constitutes the fuel 
material for activity and energj’- production. During 
periods of demand it is liberated in relatively large quan- 
tities from its storage place, principally the liver. As 
activity continues, its supply in the body becomes less 
and less. Blood sugar records for the subjects studied 
showed considerable variation, dependent upon the stage 
of fatigue at which measurement was made. Increased 
amounts of blood sugar were found in the initial stages 
of fatigue or exercise, but decreased amounts were found 



400 


The Measurement of Fatigue 

at later periods, when it can be assumed that the extra 
liberation of blood sugar in the body had been more than 
offset by the extra demand for it. Equal exercise does 
not affect everyone equally with respect to blood sugar 
changes; and since individual differences in blood sugar 
reactions are wide, such records seem hardly practical as 
an index or measurement of fatigue. 

4. Metabolism. The term “metabolism” usually re- 
fers to the sum total of energy changes going on in the 
body. Since energy changes going on in the body are 
increased in proportion to the amount of exercise or work 
being done, it might be supposed that measures of met- 
abolic rate as compared with resting rate would consti- 
tute good indications of amount of fatigue. Metabolism 
changes in the George Washington experiment were 
measured by a Sanborn graphic metabolism machine. 
Measurements of “basal metabolic rate” showed an av- 
erage increase of 54.5 per cent. Fig. 25 shows a typical 
tracing before and after fatigue. Note in the lower rec- 
ord, after fatigue, the longer strokes, signifying deeper 
breathing ; the greater number of strokes, signifying more 
frequent inhalations; and the greater slope of the curve, 
indicating increased consumption of oxygen. 

5. Blood cell studies. The red blood cells, particu- 
larly their hemoglobin content, are concerned in the car- 
rying of oxygen, which must be supplied in extra 
quantities during and immediately following processes 
which produce fatigue. White blood cells exhibit a pro- 
tective reaction in increased numbers whenever there 
are injurious or toxic substances to be eliminated from 
the body. Since injurious waste products seem to be the 
most important causal factor of fatigue, it would seem 
logical to find these cells increased in proportion to the 
fatigue. The experimental study showed an average 




401 


Fig. 25^ — Metabolism Records Before and After Fatigue. 


402 


The Measurement of Fatigue 


increase in red blood cells of about 6 per cent and of 
white blood cells of about 57 per cent. 

The various physiological changes produced by fatigue 
are summarized in Fig. 26. The two types of measure- 
ment that seem to offer the most reliable quantitative 
indications of fatigue are measurement of metabolism 


PULSE COUNT 

PULSE PRESSURE 

RESPIRATORY RATE 

WHITE BLJOODCEU5 

METABOLISM 
CO2COMBININ6 POWER 


BLOOD PRESSURE 
(Diastolic) 


BLOOD PRESSURE 

BLOOD SUOAR 


|ll3.5<7!» 


iTZOfo 


1 60 . 5 ^ 

1 57.0 71 


43.0*/<> 

IlHaas^ 

1 1^0%. 


LEG END 
jllllfl Increase 
MOecrwe 


RED BLOOD CELLS ^ 6-0^ 

Fig. 26 . — Physiological Changes Produced by Fatigue. 


and measurement of carbon-dioxide combining power of 
the blood. Both of these seem to be sensitive to rela- 
tively slight amounts of fatigue, and both are relatively 
independent of transitory mental and emotional causa- 
tions for their change, which make some of the other 
methods, as that for blood pressure, rather impractical. 
In the utilization of any of these physiological measure- 
ments of fatigue, the desirability of obtaining nor- 
mal measurements for subjects to be studied is clear. 


CHAPTER XXIV 


Laboratory Tests in Mental Disorders 

W E MIGHT expect to find the basic differentiation 
between the mentally disordered or insane person 
and the normal person in the chemistry of their bodies. 
For certain types of disorders this is possible. The hal- 
lucinated, tremulous alcoholic has a system saturated 
with a toxic chemical; the nervous, irritable hyperth3n:oid 
is the result of too-powerful action of the chemical from 
the thyroid gland; the stupid, poorly developed cretin is 
the result of too little of this same glandular chemical; 
the silly, grandiose paretic patient has his insane charac- 
teristics because of an infectious agent that has invaded 
his blood stream and nervous system; the neurasthenic or 
neurotic individual might be defined in terms of the ac- 
cumulated chemical waste products of an overactive life. 
All these examples of mental disorders, and many others, 
furnish psychological problems which can be really un- 
derstood only by the utilization of physiological measure- 
ments. Whatever information may be gained through 
quantitative studies of intelligence and personality, in 
which the measuring instruments we have already dis- 
cussed are utilized, should be supplemented by whatever 
physiological tests are available. It is only to be re- 
gretted that these tests cannot be extended to many of 
the so-called “functional” mental disorders, as yet so, 
little known from the standpoint of their chemistry. 

In the subsequent paragraphs we shall describe some 

403 



404 Laboratory Tests in Mental Disorders 


of the more important tests that have been found of 
service in the field of mental disorders or abnormal 
psychology. 


I. Blood Tests 

Should we try to observe one thing that would give us 
the most information about the body as a whole, we 
should select the blood stream. The blood stream carries 
chemicals which nourish all the cells of the body; it 
carries the oxygen necessary for combustion of the body 
fuel; it carries the enzymes that stimulate the digestive 
processes in the body; it carries the endocrine products 
which speed up or slow down the body processes; it car- 
ries various waste products which accumulate in the body 
from fatigue; and, lastly, it bears the burden of many 
toxic products which either gain entrance into the body 
from the outside or arise from the body itself. It is not 
surprising, then, that we should examine the blood in an 
attempt to disclose the basis of a mental disorder. The 
more important blood tests made in connection with 
mental disorders are described below. 

1. Tests for syphilitic infection. The two tests com- 
monly used for discovering syphilitic infection are the 
Wassermann reaction and the Kahn tests. When an in- 
dividual has syphilis, the entrance of the invading syphi- 
litic germs causes the body to build up in the blood stream 
certain counteracting substances of a protective nature in 
an attempt to rid the body of the invading organisms. 
The laboratory tests for syphilis are, in principle, tests 
for discovering the presence of these counteracting bodies 
in the blood stream. 

The importance of these tests lies in their diagnostic 
value in paresis and tabes dorsalis. The former is a men- 
tal disorder likely to be accompanied by marked mental 



Laboratory Tests in Mental Disorders 405 

deterioration and grandiose delusions, and the latter is a 
disorder showing many nervous symptoms. Some 95 per 
cent or more of paretics show a positive reaction in the 
blood. Between 50 and 75 per cent of tabes dorsalis 
cases yield a positive blood Wassermann. It should be 
remembered that a case with a negative blood Wasser- 
mann may show a positive reaction in the spinal fluid. 
Hence the importance of the spinal fluid test in nervous 
cases. 

2. Blood cell tests. These include tests to determine 
the number of red blood cells per unit volume of blood ; 
the amount of hemoglobin in the red cells; and the num- 
ber of white cells per unit volume of blood. The first 
two of these are important primarily because of the im- 
portance of the red blood cells in carrying oxygen to the 
various parts of the body. We might expect to find 
mental as well as “physical” disturbances when the 
efficiency of this transportation of oxygen is interfered 
with by a deficiency of red cells or by low hemoglobin 
content. White cell measurements are of importance 
primarily in infectious disorders, and since a considerable 
number of mental disturbances are infectious in origin, 
white cells are often of importance in diagnosing and 
dealing with mental abnormalities. 

We may quote one example of such mental 
disturbances: 

A boy, age 13, who had previously been getting along 
very well in school suddenly began to have trouble with 
his work. He found it difficult to go to sleep because he 
constantly heard voices which taunted him and said 
mean things to him. He had been hearing these voices 
for several weeks when first examined. On examination 
he was found to be introverted and emotionally flat- 
tened, apparently living in a world of his own creation. 
The blood tests gave him a hemoglobin of 48, and a red 



406 Laboratory Tests in Mental Disorders 

cell count of 2,100,000 (5,000,000 are normal) . He was 
put on heavy doses of liver extract to build up his blood 
condition, and as the hemoglobin gradually increased, 
his mental condition improved. After the hemoglobin 
reached 70 no “voices” were heard. His school work 
improved, and he passed without difficulty to the next 
grade. It has now been almost a year and he shows no 
signs of a relapse.^ 

3. Blood sugar. Chemical analysis of the blood to 
determine the amount of the blood sugar is of particular 
importance in diabetes. Since mental symptoms and 
coma often result from diabetes, the blood sugar test can 
be considered as having some value in the interpretation 
of mental disturbances. Recent studies of blood sugar 
curves, after the subject has been fed quantities of glu- 
cose, indicate that perhaps the bodily reaction to in- 
creased sugar varies according to types of mental 
symptoms.* 

4. Carbon-dioxide combining power. Laboratory 
tests are available for determining the capacity of the 
blood for carrying carbon dioxide. Such tests are indica- 
tive of the state of fatigue or exhaustion existing in the 
body. They may be of value in diagnosing and suggest- 
ing treatment for certain cases of neurasthenia. 


II. Spinal Fluid Tests 

The whole nervous system is bathed in a fluid which 
serves primarily to protect it from shocks. Since this 
fluid is in such intimate relation with the nerve tissue, 
it is not surprising that we should find reflected in its 


1 Moss, F. A., and Hunt, T., Foundations of Abnormal Psychology, 
p. 390, Prentice-Hall, Inc., New York, 1932. 

2 Drury and Farran-Ridge, '‘Types of Blood Sugar Curves Found in 
Different Forms of Insanity,” Journal of Mental Science, Vol. 71 (1925), 



Laboratory Tests in Mental Disorders 407 


chemical composition disturbances that take place in the 
nervous system. For spinal fluid tests a small amount 
of spinal fluid is withdrawn through a puncture made in 
the lumbar region of the spinal column. 

1. Wassermann reaction. This is a test of syphilitic 
infection in the spinal fluid similar to the blood syphilitic 
tests already described. 

2. Lange’s colloidal gold test. In a colloidal gold 
test, ten test tubes are arranged in a row. Each test tube 
contains five cubic centimeters of colloidal gold solution 



Fig. 27. — Colloidal Gold Curves in Various Mental Disorders. 


and one cubic centimeter of spinal fluid diluted with 
physiological salt solution, the dilution ranging from 
1 : 10 in the first tube to 1 :5120 in the tenth tube. After 
the tubes have stood for twenty-four hours, readings 
for all ten tubes are taken. The intensity of reactions in 
the successive tubes is determined by color. Normal 
spinal fluid shows the original salamander red color in 
all tubes. The negative reaction, or zero reading, is sal- 
amander red in color, as in the beginning ; a slightly pos- 
itive reaction, or reading of 1, is reddish bluej the next, 








408 Laboratory Tests in Mental Disorders 

a reading of 3, is blue; the next, a reading of 4, is pale 
blue; and the strongest intensity of reaction, marked by 
complete precipitation of the colloidal gold, reading 5, 
is colorless. The readings from the test tubes are com- 
monly shown by colloidal gold curves. Fig. 27 shows 
the difference in types of curves obtained in three typi-. 
cal mental disorders. 

3. Cell count for spinal fluid. Increase in the num- 
ber of cells in the spinal fluid is found in practically all 
cases of paresis and cerebral syphilis, in the majority of 
cases of lethargic encephalitis in the acute stages, and in 
acute infections of the meninges. Red cells, indicating 
blood in the spinal fluid, may be found in cases of trau- 
matic injury to the brain or spinal cord, tumors, and 
hemorrhages. Normal spinal fluid contains a few white 
cells but no red cells. 

4. Sugar. Chemical analysis of spinal fluid for sugar 
is sometimes made, since in cases of sleeping sickness and 
dementia praecox there is often an increased sugar con- 
tent in the cerebrospinal fluid. 

III. Basal Metabolism 

Metabolism tests measure the speed of chemical reac- 
tions which go on in the living cells in connection with 
the immediate maintenance of the living state. It is 
usually measured by the amount of oxygen which the 
person consumes in a resting state after a fast of at least 
six hours. Expressed as a basal metabolic rate, it is the 
ratio of the amount of oxygen actually consumed in a 
unit time compared with the amount which the person 
should normally use according to his age, height, and 
weight. High metabolic rates are expressed as plus val- 
ues over 0, as +18; low metabolic rates as minus values 
below 0, as — 21. 



Laboratory Tests in Mental Disorders 409 

In mental disorders, measurements of basal metabolic 
rates are particularly important in the diagnosis of thy- 
roid disturbances. Low metabolism is considered prac- 
tically diagnostic of thyroid insufficiency; and high me- 
tabolism of hyper-functioning of the thyroid. 

IV. Blood Pressure 

Blood pressure represents a reaction between the walls 
of the blood vessels and blood in the vessels. The main- 
tenance of normal blood pressure depends primarily up- 
on elasticity of the walls of the blood vessels and the 
integrity of the heart muscle. Abnormally high blood 
pressure usually means hardening of the blood vessel 
walls. In old people there is a marked tendency for loss 
of elasticity of these walls, with consequent rise in blood 
pressure. In conditions of arteriosclerosis, the harden- 
ing becomes quite marked and the pressure extremely 
high. Arteriosclerotic conditions in the blood vessels of 
the brain are invariably accompanied by mental symp- 
toms. 

Low blood pressure associated with an inefficient circu- 
latory system may be of diagnostic value in some mental 
disorders. Such a condition is very commonly found in 
neurasthenia and frequently occurs in dementia praecox. 



CHAPTER XXV 

Glandular Function Tests 


I. Nature of the Endocrine Glands and Their 
Importance to Psychology 

T he endocrine glands are relative newcomers in the 
field of psychological measurement. While their 
existence has been known for a considerable time, quan- 
titative measurements in the field of gland study have 
had to await years of physiological and chemical research 
into the nature and effect of the glandular secretions. 
Even now this research should probably be regarded as 
just at its beginning. We shall find that in most in- 
stances the quantitative measurements have reached 
only a very crude state of development. 

The endocrine glands are relatively small collections of 
specialized cells located at scattered places in the body. 
They exert their influences by manufacturing and pour- 
ing into the blood stream chemical substances (endo- 
crines, internal secretions, or hormones) which, through 
their effect on various parts of the body, markedly influ- 
ence the development and behavior of the individual. 
The hormones have often been called chemical regulators 
of the body. In this capacity they preside over four 
main functions: (1) growth and development, including 
both physiological and mental growth; (2) sex and re- 
production; (3) nutrition and general metabolism; and 
(4) secretion from other glands, including maintenance of 
proper glandular balance with reference to the secretions 
of all the glands. 


410 



Glandular Function Tests 


411 


Chief among the endocrine glands of interest to the 
psychologist are the thyroid, the adrenals, the gonads (sex 
glands), the pituitary, the pancreas, and the parathyroids. 

The thyroid gland, which encircles the windpipe, is the 
chief regulator of metabolism. It is the chief determi- 
nant of the rate at which the various physiological and 
psychological functions of the body take place. It has 
often been termed the gland which regulates our “speed 
of living.” 

The adrenal glands, which lie just above the kidneys, 
produce two major hormones, adrenalin and cortin. Ad- 
renalin is the hormone liberated in increased amounts in 
emotional states, and in other states of extreme exertion. 
Cortin is a hormone necessary to the maintenance of cir- 
culatory efficiency, and in turn to the maintenance of 
life. Deficiencies lead to circulatory collapse, weakness, 
and finally death. Overproduction of cortin has been 
noted by some to influence sex characteristics. Tenden- 
cies toward masculinity in the female are sometimes 
based upon excessive adrenal function. 

The gonads profoundly influence man both psycholog- 
ically and biologically. They determine in part the de- 
velopment of the various secondary sex characteristics, 
such as voice, bodily contours, size, emotional differences; 
in short, those things which determine the characteristics 
of masculinity and femininity. Biologically, endocrine 
secretions from the sex glands have many important roles 
to play in the process of reproduction. 

The pituitary gland, situated at the base of the brain, 
produces several different hormones. One of these hor- 
mones controls skeletal growth. Excessive functioning 
may produce gigantism ; deficient functioning, dwarfism. 
Circus giants and midgets are most frequently the result 
of such dysfunctions. Other important hormones from 



412 


Glandular Function Tests 


the pituitary gland play a role in the sex cycle, and 
are essential to the normal functioning of sex glands and 
to the normal development of those traits characterizing 
the two sexes. 

The pancreas is a gland only part of which is endo- 
crine in nature. Its endocrine function is in relation to 
carbohydrate utilization in the body. Deficiency in this 
function produces a disturbance (diabetes mellitus) in 
which there is a marked rise in the sugar content in the 
blood. This condition is not infrequently accompanied 
by profound symptoms in the mental sphere. 

The parathyroid glands have to do primarily with the 
metabolism and utilization of calcium in the body. 
Their mental and psychological relationships depend on 
the fact that disturbances in the calcium level are likely 
to be reflected in disturbances in nervous system sensitiv- 
ity and irritability. Low blood calcium is likely to pro- 
duce increased neural sensitivity, which is manifested in 
conditions of tetany and convulsion. 

The scope of this book does not permit a detailed dis- 
cussion of the function of the various endocrine glands. 
This brief resume of a few of them has been given merely 
to suggest their importance to those interested in human 
behavior. 

The psychologist is interested in endocrine glands, and 
in measurements of their level of activity and efficiency 
of function, primarily because of their relationship to 
(1) mental development, (2) personality, (3) emotional 
reactions, and (4) mental disorders. An example of en- 
docrine influence on mental development is furnished by 
cretinism. This is a condition caused by complete ab- 
sence or defective functioning of the thyroid gland, oc- 
curring either congenitally or in early childhood. It 
occurs sporadically all over the world and is of veary fre- 



Glandular Function Tests 


413 


quent occurrence in certain districts where there is a 
deficiency of iodine — the chemical necessary for manu- 
facture of thyroid secretion in the body. The symptoms 
of cretinism are usually noticeable during the first year 
of the child’s life. Since the thyroid gland has the main 
function of presiding over general bodily metabolism, of 
controlling the speed of the chemical processes taking 
place in the body, these processes in the cretin go on at 
such a low ebb that he simply remains at an infantile 
level of development. Physically, his face retains the 
flattened features and broad expanse between the eyes 
characteristic at birth. He does not grow in stature and 
his growth is likely to be ill-proportioned. The legs and 
arms remain short, the abdomen protrudes and assumes 
a size out of proportion to the rest of the body. Men- 
tally, the cretin is very much retarded. He makes few 
intelligent responses to his environment. He is unable to 
learn those things which the normal child learns. He 
does not begin to talk at the normal age, and his general 
adjustment to his surroundings is markedly below par. 
If untreated, he is likely to live his life at the level of 
an idiot or an imbecile. 

Almost all glandular disturbances produce some effect 
upon personality. Among the cases short of actual men- 
tal disorder, we may cite the hyperthyroid as an example. 
These individuals are likely to have a personality colored 
by restlessness, by a sort of anxiety, and by a hyperex- 
citability to emotional stimuli. Their emotional control 
is usually somewhat defective and they are easily aroused 
to fits of temper. They often possess an extreme degree 
of “pep and energy,” which may manifest itself in nerv- 
ousness unless constantly directed toward the accom- 
plishment of some task. 

Another outstanding example of endocrine influence 



414 


Glandular Function Tests 


upon personality is furnished by the condition usually 
referred to as Frohlich's syndrome, which results from in- 
adequacy of function of the pituitary gland in its rela- 
tion to sex gland function. Clendening has given a 
graphic picture of this condition; 

It begins before adolescence and occurs mostly in 
males, though it is found in females. The boys are fat, 
feminine, weak, and misunderstood. Their manifest 
deformities are regarded by their parents, teachers, and 
playmates as natural and inevitable variations of human 
structure, rather different, but within the normal limits. 
They are sissy because they are sissy: some boys are 
sissy. They are fat because some people are fat. They 
are weak and do not play boys’ games because they are 
sissy. That is the usual view. It is not commonly rec- 
ognized that they are definitely in a mutual deficiency 
group, that the deficiency is an affair of internal secre- 
tion, and that treatment to be effective at all must begin 
early in life.' 

As an example of endocrine function in emotional 
states we have already discussed in Chapter XXI the 
role of the adrenal glands. These have an important 
function to play in the production of the physiological 
state which is characteristic of all strong emotions. The 
thyroid gland also is related to emotional functioning. 
Long periods of emotional excitement tend to produce 
hyperthyroidism with various attendant symptoms. 
During the World War, for example, there was a striking 
increase in the number of cases of this type, presumably 
to be accounted for by the great emotional stress to which 
groups of people were subjected. 

Among the mental disorders or insanities of endocrine 
origin we may note one of profound depression — involu- 

^Clendening, L., Modem Methods of Treatment, C. V. Mosby Co., 
St. Louis, 1928, p. 232. 



Glandular Function Tests 


415 


twn melancholia. This is a disorder occurring at the 
period of involution, when there is a regression in sex 
gland function with a subsequent general glandular im- 
balance. A few who are unable to weather the storm 
of this glandular readjustment and its effects on the nerv- 
ous system become afflicted by an extreme depression. 
They develop a profound melancholia which someone 
has characterized as a “saturated solution of grief.” 
They are continually beset with anxiety, worry, and de- 
spair, which have little relation to the actual circum- 
stances of their existence. They often pace the floor for 
hours wringing their hands and moaning continually. 
Behavior of a voluntary, directed sort is at a standstill. 
They are unable to bring themselves to accomplish even 
simple tasks that need to be done. They are often neg- 
ativistic toward doing the biddings of others, this neg- 
ativism often being carried so far as to refuse food over 
long periods of time. Delusions, or false beliefs, are fre- 
quent — the commonest types are beliefs of sinfulness or 
unworthiness. Hypochondriacal delusions or absurd be- 
liefs about one’s body are also of frequent occurrence. 

These examples of glandular influences on our psycho- 
logical makeup are only a few of many that might be 
cited. Mentality and mental growth, while affected 
most markedly by the thyroid gland, may be affected less 
or more indirectly by other glands. Personality is af- 
fected probably by every endocrine gland in the body 
and, furthermore, by the particular balance in which the 
various glands are maintained in their relationship to each 
other. Some students of personality have even gone so 
far as to attempt to label personality types according to 
particular gland-function dominances. While much of 
this classification is unsupported by evidence, we need 
not discard the general idea back of it. Emotions may 



416 


Glandular Function Tests 


be most often discussed in relation to the adrenal glands, 
but the difference between the emotionality of the hyper- 
thyroid and that of the hypothyroid must suggest also 
the importance of this gland, and in the emotions of love 
and sex life the sex and the other glands involved in the 
sex cycle are of prime importance. Finally, as etiological 
agents in mental disease, the endocrines are involved 
in many of the psychoses. There are the thyroid psy- 
choses associated with toxic conditions of the thyroid; 
epilepsies of endocrine origin; manic depressives and 
dementia praecoxes of glandular nature; and neuras- 
thenias of pituitary dysfunction. 

These examples in themselves will suggest the im- 
mense importance to the psychologist of studies of en- 
docrine function. They are related not only to the 
specific aspects of behavior constituting the four cate- 
gories which we have mentioned, but also to the problems 
of school progress, social adjustment, vocational guidance, 
and industrial and vocational efficiency. 

II. Measurement in Relation to Endocrine Function 

So far as quantitative measurement is concerned, we 
can readily appreciate the essential nature of a method 
to determine the precise degree of glandular activity of 
each of the endocrine organs. Unfortunately such meth- 
ods are not generally available at the present time. 
When the different hormones have been chemically iden- 
tified and the normal quantity of each in the blood has 
been analyzed, we can expect to have at our disposal a 
quantitative index by which to gauge the level of their 
activity. So far, only very few of the endocrine secre- 
tions have been chemically analyzed, and even for these 
there is no direct quantitative test which has proved 
practical. At present the level of functioning of the 



Glandular Function Tests 


417 


endocrine glands is generally studied by indirect methods, 
and our quantitative measurements of their functions are 
usually in terms of their effect upon some part of the 
human body or upon some aspect of animal behavior. 
Some of the tests have reached the specificity of measur- 
ing the effect of the glandular secretion upon a specific 
part of the bodily function which seems to be controlled 
by the gland. This is illustrated by the measurement of 
insulin production by the pancreas in terms of blood- 
sugar level. The insulin of the pancreas is the important 
controlling factor in blood-sugar level; hence, we meas- 
ure amount of insulin in terms of the sugar content in 
the blood. Others of the quantitative measurements of 
glandular function are in terms of much larger effects in 
the body or in terms of more general results of the en- 
docrine function. An example in this category is the 
measurement of thyroid function in terms of its effect 
upon general body metabolism. These general nieasure- 
ments are more likely to be affected by other factors, and 
the measurements can usually be taken as indicative of 
glandular function only if the conditions under which 
the measurement was made are rigidly controlled. Fi- 
nally, there is a considerable group of measurements of 
glandular function based upon the effect upon test ani- 
mals of fluids of the body which contain the endocrine 
products. Most of the tests of sex hormones and pitui- 
tary hormones are based upon such measurements. For 
example, the level of ovarian (female sex) hormone in 
the blood is measured by its effect upon the sex apparatus 
of an immature animal such as a mouse, rat, or rabbit. 

Representative quantitative tests will be described 
briefly. No attempt will be made to give exact techni- 
cal procedures. The tests can ordinarily be carried out 
only after a certain amount of technical training in the 



418 Glandular Function Tests 

procedures, and for discussion of these procedures the 
reader must be referred to textbooks on the subject. 

1. The basal metabolism test for thyroid function. 
This test aims essentially at finding the rate at which 
the metabolic processes go on in the body. In order to 
understand the nature of the test, let us represent the 
metabolic processes as chemical changes in the nature of 
oxidation processes: 

Fuel Material + Oxygen *■ 

Energy or Activity + Waste Products 

Since the rate of this process is primarily controlled by 
thyroxin from the thyroid, acting in the nature of a 
catalyst, as some have expressed it, thyroid function may 
be measured if the rate of this process can be measured. 
It is conceivable that the rate of liberation of waste prod- 
ucts or utilization of fuel would be proportional to metab- 
olism or to the rate at which the whole process goes on. 
But these are hard to measure. Because it is easily 
determined, oxygen consumption is usually measured. 
It is easily measured because it is consumed alike 
by everybody, because it is not stored to an appre- 
ciable extent in the body, and because it can be designated 
in easily measurable units. A test of metabolic rate is, 
therefore, carried out by measuring the amount of oxy- 
gen which a person consumes in a given time. In order 
that all tests may be taken under the same conditions 
and that there shall not be present varying factors which 
may influence the metabolic rate, the test is done in a 
resting state after a fast of at least six hours. Basal 
metabolism machines for making such a test consist of 
an arrangement for administering pure oxygen to the 
subject and of measuring the amount of this consumed 
over a given time, usually eight or ten minutes. The 



Glandular Function Tests 


419 


amount consumed is often recorded graphically by the 
drop in a record line which is made on a chart attached 
to a revolving drum. One of these charts was presented 
in the discussion of metabolic changes in fatigue (see 
Fig. 25). The result of a metabolism test is usually ex- 
pressed as a basal metabolic rate, which is the ratio of 
the amount of oxygen actually consumed in a unit time 
compared with the amount which a person should nor- 
mally use according to his age, height, and weight. High 
metabolic rates are expressed as plus values over zero, as 
+25; low metabolic rates, as minus values below zero, 
as — 21. 

2. Blood-sugar test for insulin production. As has al- 
ready been suggested, pancreas function in its produc- 
tion of the hormone insulin is measured in terms of 
blood-sugar content. Blood-sugar content can be taken 
as a fair indication of production of insulin, since rarely 
are there marked variations in blood sugar due to other 
causes. The blood-sugar test is a quantitative chemical 
test in which the number of milligrams of blood sugar 
per hundred cubic centimeters of blood is determined. 
In the normal individual, the blood contains about 100 
mg. of sugar per hundred cc., limits of 90 and 120 mgs. 
being widely accepted as lower and upper normal limits. 
Excessive amounts of blood sugar are practically always 
indicative of insulin deficiency. Excessively low blood 
sugar is usually indicative of amounts of insulin above 
normal, usually met only in excessive artificial adminis- 
tration of insulin in diabetics. Blood-sugar level and 
insulin function afford excellent examples of the depend- 
ence of mental life upon chemical balance. In most cases 
of diabetes, the mental disturbances are usually of a mild 
type, perhaps only a mild depression. As the blood-sugar 
content grows higher and higher, coma, with complete 



420 


Glandular Function Tests 


loss of consciousness, finally occurs. Inject into the in- 
dividual some insulin, and his blood-sugar level decreases, 
the balance is swung in the other direction, and he re- 
sumes mental functioning. Continue to give insulin, and 
the individual will again pass into a state of coma due 
to too little blood sugar. The balance can be swung 
back to normal by administration of glucose (sugar). It 
seems from this that, in order to maintain normal mental 
functioning, a remarkable chemical balance is necessary 
with respect to insulin production and blood-sugar level. 

3. Measurement of parathyroid function. Secretion 
of parathyrin by the parathyroid glands may be meas- 
ured in terms of calcium content in the blood. This may 
be considered proportional to the parathyroid function, 
since one of the chief roles of the parathyroid is that of 
controlling calcium metabolism in the body. Like the 
blood-sugar test, the calcium test is one of quantita- 
tive determination of the amount contained in a unit 
quantity of blood. Normal calcium figures are from 9 
to 11 mgs. per hundred cc. of blood. A decrease in this 
index suggests parathyroid deficiency, and, in the pres- 
ence of other indications of parathyroid disturbance, 
would call for parathyroid therapy. 

4. Measurement of ovarian hormone. The ovary, or 
female sex gland, secretes an important hormone termed 
cestrin, which has a marked influence on the sex cycle 
and on changes incident to pregnancy. It is this hor- 
mone which is responsible for production of the cestrous 
state (“heat” in animals). The measurement of cestrin 
content in the blood is of importance in many clinical 
conditions in which sex function is disturbed. Quanti- 
tative estimation of cestrin level in the blood depends 
upon its effect in producing an cestrous state in a female 
animal from which the ovaries have been removed. The 



.Glandular Function Tests 


421 


test is made by administering to a mouse, from which 
the ovaries have been removed 14 days previously, con- 
centrated blood from the individual to be tested. Twen- 
ty-four hours and forty-eight hours after the injection 
has been given to the mouse, an examination is made to 
determine whether the injection has produced an oestrous 
state. Since the oestrous state is manifested by changes 




Blood mixed with 


36gm,Na2S04 
until dryness occurs 



Mixture 

pulverized 



Ether extract 
emulsified 
in water 


Emulsion 
in vial ready 
for use 


Castrated mouse 
injected 



Vagina! secretion 
examined twice daily 


Fig, 28.— -Technique of Blood-CEstrin Determination. 


taking place in the vagina, the test consists of an exam- 
ination of vaginal smears. Mazer and Goldstein have 
graphically depicted the technique of blood-cestrin de- 
termination as shown in Fig. 28. Blood-cestrin content 
at various stages in the sex cycle in the non-pregnant 

^ 2 Mazer, Charles, and Goldstein, Leopold, Clinical Endocrinology of 
the Female, W. B. Saunders Co., Philadelphia, 1933, p, 156. 



422 Glandular Function Tests, 

individual and in the pregnant individual is indicated 
in Fig, 29." Gonadal dysfunctions may show deviations 
from this normal curve. During pregnancy the oestrin 
level shows a continued rise, beginning early in preg- 
nancy and reaching its peak just before birth of the 
child. 



Fig. 29. — Blood-OSstrin Content at Various Stages of the Sex Cycle. 

A — Q^strin occasionally demonstrable at this time in 40 cc. of blood. 
B — Q^strin demonstrable in 94% of normal women at this period in 
40 cc. of blood. 

C — CEstrin content falls during menstruation. 

D — OSstrin content drops in first two months of pregnancy owing to 
excessive secretion. 

E — CEstrin demonstrable by injections of whole blood into animals. 

5. Hormone tests of pregnancy. Hormone tests for 
pregnancy are generally based upon the presence in in- 
creased amounts, either in the blood or in the urine, of 
cestrin, which we have just discussed, or of anterior pitu- 
itary sex hormones. Since the anterior part of the 
pituitary gland secretes one more hormones which 
have a part in the sex cycle and are essential to the normal 
functioning of the sex gland, the presence of these pitu- 
itary hormones or their variations are often just as indic- 
ative of changes in sex gland functions as are changes in 


8/d., p. 166. 





Glandular Function Tests 


423 


the sex hormones themselves. Their utilization in tests 
of pregnancy is possible because of their secretion in 
increased amount. During pregnancy, the blood level 
reaches such a high point that these hormones may also 
be thrown off in detectable amounts in the urine. Since 
the tests based upon urine analyses can be made as easily 
and with less inconvenience to the individual tested, they 
are the tests usually employed in the hormone determi- 
nation of pregnancy. 

If the presence of oestrin in increased amounts is used 
as a test for pregnancy, the procedure carried out is very 
similar to that described in the last section. The anterior 
pituitary sex hormone tests of pregnancy are carried out 
by injecting the urine of the suspected case of pregnancy 
into an immature female animal (mouse, rat, or rabbit). 
The ovaries and other parts of the reproductive appa- 
ratus of the animal are then examined, after a suitable 
length of time, to ascertain the effect of the injection. 
If the injection is from a pregnant individual and has 
contained anterior pituitary sex hormone, the immature 
animal will have been stimulated to a state of sexual 
maturity, the ovaries will show evidence of ovulation, 
and other changes indicative of sexual stimulation will 
appear in the uterus and uterine tubes. 

These tests have been of immense benefit to the clini- 
cian in making possible very early diagnoses of pregnancy 
and in enabling distinctions to be made between preg- 
nancy and other conditions which may simulate it. 



CHAPTER XXVI 


Physiological Measurements of Emotion 

T here are many reasons for desiring to measure the 
emotions or the changes indicative of emotional re- 
actions. Because of the “emergency” nature of emotion, 
particular significance attaches to its relation to conduct. 
Emotion itself calls for immediate reaction with little de- 
liberation and with little weighing of the alternatives in 
the situation, a procedure which is conducive to the most 
satisfactory, the most social, or the best reaction in the 
long run only when previous reactions have established 
a good precedent for similar situations. Our specific in- 
terest may be in the individual differences in emotional 
reaction — in the individual variations in degree of emo- 
tion resulting from the same stimulus or situation ; or in 
the individual variations in range of stimuli capable of 
arousing emotion; or in the individual variations in the 
expression of the emotions. At another time, we may 
pursue the measurement of emotions as a means of dis- 
covering bases or causes for impulsive action, unsocial 
behayior, or abnormal conduct. Again, we may study 
the measurement of emotions as a means to increased 
knowledge of the nature of the emotion itself. 

We have discussed in Chapter XXI the attempts, 
through verbal tests, to measure conduct involving emo- 
tional elements. These'measurements are for the most 
part measures of conduct, of adjustment or maladjust- 
ment to one’s surroundings. The conduct is often, per- 

424 



Physiological Measurements of Emotion 425 

haps usually, related to the emotions, but the meas- 
urement is not a direct test of the emotional state. The 
measurements are not even in a medium applicable to 
the emotions themselves. The tests are verbal. Emo- 
tions themselves cannot be verbalized; they are a func- 
tion of the unconsciously acting visceral, glandular, and 
autonomic systems of our make-up. What we need in 
studying many of the problems connected with emotions 
is a more direct measure of the changes which are char- 
acteristic of the emotional state, 

I. The Nature of Physiological Measurements 
of Emotions 

The measurements we are about to discuss concern the 
physiological changes which accompany or constitute the 
emotional state. A brief review of these would seem 
pertinent. In most emotional reactions, the body is pre- 
pared for an immediate and intense activity through the 
action of the autonomic nervous system. For the signifi- 
cance and biological usefulness of this response, we must 
go back in the evolutionary scale behind the human 
species. In the lower animal, the natural consequence 
of fear is flight; of rage or anger, combat. Both of these 
emotions lead to natural responses which make immedi- 
ate demands upon the body for energy production. Na- 
ture has taken care of the situation by a mechanism of 
response, connected with the emotion, that prepares the 
body for the demands. In our own cases, we as a species 
may not always make the activity response — ^we may not 
always flee from the fear-producing stimulus or engage 
in combat with our rage-producing opponent; neverthe- 
less, we have not lost the biologically developed tenden- 
cies to react in such a fashion. In strong emotions, our 
bodies are still prepared for the emergency responses. 



426 Physiological Measurements of Emotion 

Among these preparations, we find an increased activ- 
ity in the circulatory system — a faster heart beat and a 
higher blood pressure — for more efficient transportation 
of fuel material and removal of waste products to and 
from the scene of activity. Blood is withdrawn from the 
digestive system to give the muscles a greater supply; 
hence digestive activity is in abeyance. Blood sugar is 
released from its storage places in the body to furnish 
extra fuel material. Sweat glands increase their activity 
to help in regulating body temperature in a situation 
likely to result in increased heat production. Breathing 
becomes more rapid to increase the oxygen supply for 
activity processes and for the removal of waste products 
in the form of carbon dioxide. Adrenalin is secreted 
from the adrenal glands to help in bringing about many 
of these changes. 

We sometimes test such physico-chemical changes as 
these when measuring emotions. Before we describe in 
more detail some of the physiological tests, a few 
general considerations regarding the methods should be 
mentioned: 

1. Practically all the physiological measurements of 
emotions necessitate the use of rather delicate types of 
instruments, many of which require a skill for their op- 
eration. This fact limits the usefulness and general ap- 
plicability of the methods, and in the hands of unskilled 
workers makes for inaccuracy of results. In most of the 
work of measuring emotions through physiological 
changes, automatic methods of recording have been used. 
Such procedures are to be recommended wherever pos- 
sible. The automatic recording is usually accomplished 
by the use of a kymograph — a drum revolving at a con- 
stant, known speed, on which is attached some kind of 
record paper. The changes, often amplified, are recorded 



Physiological Measurements of Emotion 427 

by a pen or writing point on the record paper. Such an 
apparatus is pictured in Fig. 30, which shows the record- 
ing of stomach contractions. 

2. Many of the measurements can be made only under 
conditions of considerable inconvenience to the subject. 
Measurements of stomach contractions, for instance, usu- 
ally necessitate the swallowing of a rubber bulb and tube. 



Fig. 30.— Diagram Showing Method Used to Record Stomach 
Contractions. 


Biochemical measurements often necessitate the punctur- 
ing of veins to draw off a quantity of blood. 

3. Most of the physiological measurements do not 
distinguish between the various emotions. They simply 
measure emotional excitement, and the records for fear 
may be indistinguishable from those for anger. 

4. It is impossible to establish general standards of 
normality as useful norms by which to judge results of 
individual tests. Except for extreme or severe varia- 
tions, a record during emotion means little except as 
compared with a “normal” taken on the same sub- 



428 Physiological Measurements of Emotion 

ject under conditions similar except for the emotional 
occurrence. 

5. Emotional reaction is only one of many factors 
which may affect most of the physiological states meas- 
ured. Blood pressure for instance is sensitive to many 
factors (listed in a subsequent paragraph of this chapter). 
The number of influences on a physiological state limit 
the usefulness of that state as a measurement of the in- 
fluence of one factor, and make it useful at all only when 
the other factors are controlled. 

With these facts in mind, let us briefly examine some 
of the physiological tests by w'hich quantitative studies 
of the emotions have been made. We shall discuss (1) 
blood-pressure measurements; (2) breathing measure- 
ments; (3) measurements of digestive system activities; 
(4) psychogalvanic reflex measurement; (5) measure- 
ments of increased adrenalin secretion; and (6) measure- 
ments for detection of deception and lying. 

' II. Blood-Pressure Measurements of Emotion 

Blood pressure is that pressure maintained against the 
walls of the arteries due to the force of the heartbeat, 
the resistance in the capillary blood vessels, and the pres- 
sure of the blood-vessel walls. This pressure, through 
changes brought about in the force and rate of the heart 
and in the character of the vessel walls, is sensitive to 
emotional excitement. Hence the utilization of blood 
pressure as a measure of emotion. 

Blood pressure is ordinarily measured by noting the 
force or pressure necessary to collapse an arterial vessel 
and prevent the pulse from passing a given point. The 
brachial artery in the arm is the one commonly employed 
for measurement. The instrument used is known as a 



Physiological Measurements of Emotion 429 

sphygmomanometer. It consists essentially of (1) a 
wrapping band for the arm with a rubber bag inside 
which can be inflated and thus increase pressure on the 
arm; (2) a rubber bulb for inflating the bag in the arm 
band; and (3) a pressure gauge connected with the rub- 
ber bag. Measurement of systolic blood pressure (the 
measurement usually employed in studies of emotions) 
is made by inflating the bag to a pressure which is just 
barely sufficient to cut off the pulse that can be felt or 
heard below the point of application of the band. Such 
a pressure must be equal to the pressure of the blood, 
since any lower pressure in the band allows the blood 
to pass through and causes a pulse in the arm or wrist. 

A number of investigations have been made of blood 
pressure in emotional states. All demonstrate that emo- 
tional excitement, short of a severe reaction of “shock,” 
is accompanied by rise in pressure. Many of the studies 
have been aimed at applying the results to such practical 
problems as that of detecting lying. This utilization of 
measurements is discussed in a subsequent section of the 
chapter. 

The drawbacks to blood-pressure measurements of 
emotions seem to be two : first, the fact that the measure- 
ments are not specific with respect to type of emotional 
experience; and second, the fact that blood pressure is 
affected by many things other than emotion, and that 
these cannot always be ruled out in the utilization of the 
measurement in studying emotions. Symonds,^ in his 
discussion of blood pressure and emotions, lists 18 factors 
besides emotion that affect the pressure. All these 
should be borne in mind in the interpretation of emo- 
tional records. Symonds’ list is given briefly here. 

1 Symonds, Pereival, Diagnosing Personality and Conduct, The Centuiy 
Co., New York, 1931. 



430 Physiological Measurements of Emotion 

1. Blood pressure varies directly with the volume of 
the blood. 

2. Blood pressure varies directly with the energy of the 
heart. 

3. Blood pressure varies with the elasticity of the 
blood vessels. 

4. Blood pressure varies with the peripheral resistance. 

5. Gravity causes variations in blood pressure. 

6. Blood pressure is higher on the average in man than 
in woman. 

7. Blood pressure varies with the size of the animal. 

8. Blood pressure increases with age. 

9. Breathing causes variations in blood pressure. 

10. A rhythmical rise and fall of the blood pressure oc- 
curs, known as the Traube-Hering Waves. 

11. Blood pressure is higher after food has been taken 
into the system. 

12. Muscular exercise causes a rise in blood pressure. 

13. Cold baths and hot baths produce a rise in blood 
pressure. 

14. Blood pressure falls with severe fatigue. 

15. Blood pressure is lower during menstruation and 
raised during pregnancy. 

16. Pain causes a rise in blood pressure. 

17. Cold produces a rise in blood pressure. 

18. Certain glandular products cause marked changes 
in blood pressure. 

From the above, it would seem obvious that conditions 
must be carefully controlled if presence of emotion or 
degree of emotion is to be detected by blood pressure 
changes. It also seems clear that we cannot compare 
blood pressure obtained at one sitting with pressure at 
another sitting, or pressure in one person with pressure 
in another. Even under the best of conditions, blood 
pressure will probably be of little indicative value ex- 
cept in rather intense emotions, in which the effect of the 
emotion would outweigh the lesser forces causing blood 
pressure to vary. 



Physiological Measurements of Emotion 481 


III. Measurement of Breathing 

Breathing has several roles to play in its biological 
usefulness in the ^^emergency and preparatory^^ reaction 
of an emotional state. It is the means of intake of oxy- 
gen, which will be demanded in increased quantities in 
any emergency reaction of the organism. It is the 
means of exhalation of one of the most important end 
products in the elimination of waste material of activity. 
It is therefore logical for us to expect emotional excite- 

Table XXVIII 

EFFECTS ON RFSPIRATION OF FEAR PRODUCED 
BY FALLING 

Respiration Response Informed Subjects Uninformed Subjects 

Change in rate . , 20% de<Teasc 12% decrease 

Duration of change 5 minutes 3 minutes 

Change in I/E . . 281% increase 201% increase 

Duration of change 5 minutes 3 minutes 

Inspiration Stimulation Marked Marked 

merit to be manifested by changes in breathing. The as- 
pects of breathing usually studied are its rate, its depth, 
and the inspiration-expiration ratio (duration of inspira- 
tion divided by duration of expiration). Table XXVIII, 
from a study by Blatz,^ gives the breathing changes for 
subjects stimulated to a fear reaction by the release of 
the chair in which they were sitting so that they were 
allowed to fall backward about 60 degrees. Some of the 
subjects were informed about the procedure; some were 
not. It will be noticed that rate of breathing decreased, 
but the I/E ratio increased, being over twice as great 
after excitement as before. These are common findings 

2 Blatz W. E., “The Cardiac, Respiratory and Electrical Phenomena 
Involved’ in the Emotion of Fear,” Journof of Experimental Psychology, 
Vol. VIII, 1925, pp. 109-132. 



482 Physiological Measurements of Emotion 

in emotions, although there have been variations in re- 
sults reported by some investigators. 

Breathing is subject to almost as great a variety of 
influences as blood pressure, and many are the same as 
those affecting blood pressure. Consequently, great care 
must be exercised in the control of conditions under 
which breathing changes in emotions are recorded. 

IV. Measurement of Digestive-System Activity 

Utilization of digestive-system activity in indicating 
emotional reaction goes, in story at least, back to the 
ancients. There is an old account that the ancients made 
those suspected of wrongdoing place dry rice in their 
mouths for a brief interval of time. If the rice came out 
dry, they were judged guilty. If the method is at all 
reliable, it is dependent upon emotional inhibition of 
salivary secretion in the guilty. 

Cannon * was one of the first actually to demonstrate 
experimentally that pleasant emotions are conducive to 
the best functioning of the digestive apparatus; and that 
agreeable surroundings, appetizing food, pleasant con- 
versation, and a placid mind are all contributing factors 
to the normal flow of digestive juices and the normal 
muscular activity of peristalsis. He demonstrated, on 
the other hand, that pain, fear, rage, excitement, worry, 
and anxiety have antagonistic effects on digestive 
functions. 

Cannon, and others following him, have employed 
three general methods for investigating digestive changes 
accompanying emotions. The first consists of the study 
of secretions of digestive juices through the observation 
in animals of the actions in a side pouch of the stomach 

® Cannon, W. B., Bodily Changes in Pain, Hunger, Fear and Rage 
(2nd ed.), Appleton-Century Co., New York, 1929. 



Physiological Measurements of Emotion 433 

which is brought to the outside by an operation. A few 
such studies have been made on human beings who, ow- 
ing to disease of the esophagus, have had to have gastric 
fistulae for the introduction of food. Cannon found in 
dogs with esophageal fistulae and with side pouches op- 
ening to the exterior, that presentation of food caused 
flow of gastric juices even during sham feeding, but that 
after the dogs were aroused to fear or anger, there was 
no flow of gastric juices on presentation of food, even 
though the animals were hungry and ate with relish. 

A second method utilized by Cannon enabled him to 
study peristaltic movements during emotional excite- 
ment. He fed his animals bismuth or some substance 
opaque to X-rays, and then observed the movements of 
their stomachs and intestines under various conditions. 
Thus he demonstrated the inhibitory effect of emotional 
excitement on peristaltic activity. 

The third method, by the use of a rubber balloon which 
the subject swallows, records stomach contractions. The 
method is shown diagrarnmatically in Fig. 30. 

These digestive measurements are presented for their 
theoretical interest rather than for their practical value 
as yardsticks in the detection and measurement of emo- 
tional reactions. The inconvenience to the subject which 
their use entails is sufficient to bar them from extended 
practical use. 

V. Psychogalvanic Reflex Measurements 

The term psychogalvanic reflex (or psychogalvanic skin 
reflex, since the changes are usually measured by elec- 
trodes applied to the skin) refers to certain electrical 
changes which accompany emotional excitement. These 
changes are usually recorded by applying two electrodes 
to the skin of the body at two points, as on two fingers, 



434 Physiological Measurements of Emotion 

or at two points on the arms, in such a way that the body 
forms a part of an electric circuit in which there is a 
sensitive galvanometer to record the changes in electric 
current. These changes were among the first physiolog- 
ical accompaniments of mental and emotional activity to 
be studied. Such a study was reported as early as 1879 
by Vigouroux. Other early studies were conducted by 
Fere and Tarchanoff, their works being reported in 1888 
and 1890 respectively. Since then the psychogalvanic 
reflex has been used in many investigations. 

The main interest in the phenomenon as a test in the 
field of emotions has rested upon the fact that emotional 
changes often are paralleled by psychogalvanic reflex re- 
actions. Many students of the problem have demon- 
strated these reactions, but there is some doubt as to the 
consistency of the results and the proper interpretation 
to be given them. A review by Paterson led him to be- 
lieve that no characteristic psychological experience can 
be identified with any characteristic of the psychogal- 
vanic reflex curve. Landis has summarized the whole 
matter quite capably, concluding that the reflex is one of 
a series of responses linked with the autonomic nervous 
system, but that it either may appear with no emotional 
accompaniment or may not appear when emotion is quite 
definitely present. He believes it worth studying as a 
phenomenon, but not as a test of emotion. Not all in- 
vestigators are quite as pessimistic, however, as Paterson 
and Landis. 

VI. Emotion Indicated by Measurement of Adrenalin 
in the Blood Stream 

It is generally known today that emotional states 
cause a liberation of increased adrenalin from the ad- 



Jfhysiological Measurements of Emotion 435 

renal glands. Why might we not utilize this physiolog- 
ical change as a measurement of emotion? 

Cannon was the first to make such a measurement. In 
demonstrating the presence of adrenalin, he made use 
of the inhibitory effect of adrenalin on intestinal muscle 
contraction. In order to determine whether there was 
increased adrenalin accompanying emotional disturbance, 
cats were placed near barking dogs and were excited by 
them. Blood was taken from a vein of a cat before and 
after exposure to the dogs. Strips of intestinal muscle 
from a freshly killed animal had been arranged so as to 
contract rhythmically in a normal salt solution, each 
muscle having been attached to a recording device to 
show its contractions. Fig. 31 ■* shows what happened 



Fig. 31.— 


Record Indicating Inhibitory Effect of Adrenalin on 
Intestinal Muscle Contraction. 


in the experiment. At («) the solution in »hi,* tta 
muscle had been beating was removed; 
an excited cat was added; at (c) the “excited blood w^ 
removed; at (d) blood from a quiet animal was adde 

in its place; at (a) the “quiet” blood ‘^7e''»ntr“ 
at (/) “excited” blood was agam applied. The contrac 


♦ /d., p. 51. 




486 Physiological Measurements of Emotion 

tion records showed that blood from the vein after emo- 
tional excitement caused a relaxation of the muscle, but 
“quiet” blood caused little or no change in contraction. 
Cannon concluded from such experiments that the blood 
after emotional excitement contained more adrenalin, 
since no other substance to be found in the blood could 
have produced such an effect. In order to confirm the 
finding further, he removed the adrenal glands from one 
of his cats and found then that the blood, after the cat 
had been excited, produced no change in contraction of 
the test piece of intestine. 

VII. Physiological Measurement of Emotion in 
Detection of Lying 

Of the practical attempts to utilize physiological meas- 
urements of emotions, those which have caused most dis- 
cussion have aimed to determine whether or not a person 
is lying in a given situation. From the scientific stand- 
point, the problem may seem a very small part of the 
study of physiology of emotions; but in the practical 
situations of criminal procedures there is often a real need 
for some device that will check the veracity of testimony 
and confessions. The instruments used in experimental 
work in this direction have earned the popular title of 
“lie detectors.” 

The theory underlying the work has been that when 
one is. telling the truth there is no strong emotional re- 
action, and that any physiological changes of an emo- 
tional nature that are produced are relatively mild and 
lack marked fluctuations. On the other hand, when 
one is lying, or is deliberately concealing the truth, he 
is on his guard and is under an emotional strain, which 
is reflected in the physiological changes accompanying 
emotions and in the marked fluctuations in physiological 



Physiological Measurements of Emotion 437 

levels at points of crisis in the story or testimony. The 
obvious suggestion that the hardened criminal can lie 
without such changes is denied by the investigators. 

Much of our information about physiological methods 
of detecting lying is derived from the investigations of 
Marston, Larson, and Landis. Marston studied in par- 
ticular the possibilities of blood pressure. He claims 
that there are blood pressure curves typical of truth and 
lying. According to him, the curve of lying shows a 
steady rise as questions are answered and testimony is 
given, reaching a peak with a crisis in the testimony. 
Truth curves are smoother; the blood pressure records 
pursue a relatively level course with only slight and few 
rises. Marston claims that rises due to incidental fac- 
tors in the truthful subject are always insignificant 
enough to cause little confusion. Larson has utilized in 
most of his w'ork a combination of records, including 
blood pressure, pulse, and breathing. He claims excep- 
tionally good results from utilizing Inspiration-Expira- 
tion Ratio in breathing (I/E). This ratio becomes 
larger, owing to relatively long inspiration, in the emo- 
tional strain of lying. Landis has experimented with 
both circulatory and inspiratory types of measures. 
Chappell in a more recent investigation has attempted to 
check some of the work of the earlier investigators. He 
points out that “deception curves” might be obtained 
under conditions other than those of deception, as when 
the subjects had been told that their intelligence was 
being measured. 

Much of the experimental work has been done in the 
psychological laboratory with artificial, faked crimes. In 
a few instances, cooperation has been given by police 
authorities and actual criminal suspects have been stud- 
ied. Marston, using systolic blood pressure, and Larson, 



488 Physiological Measurements of Emotion 

using a combination of blood pressure and breathing 
records, both claim over 90 per cent accuracy in detect- 
ing lying. Per cent accuracy of detection as reported 
by Landis is considerably lower, although he believes 
both blood pressure and I/E ratio possess definite diag- 
nostic value for truth and lying. The best opinion seems 
to be that the extreme claims are too great to expect in 
practical applications. Undoubtedly much controlled 
investigation is needed before such methods can be of 
much use in actual criminal procedures. More experi- 
mentation is needed in real as contrasted with artificial 
laboratory situations; there needs to be some perfecting 
of instruments and technique; we must know more about 
the effects, on results, of familiarity of the subject with 
the procedure, of type of questioning, of type of sur- 
roundings in which questioning is done, and of person- 
ality make-up and emotional susceptibility of the subject. 
The method is not without considerable promise and, as 
an application of great practical significance, deserves 
these further studies. 

Considerable discussion has recently centered about 
another interesting procedure which has been suggested 
for the detection of guilt. Reference is made to the in- 
terest in so-called “truth serums.” These are essentially 
sedative drugs that, by forestalling the inhibitory action 
of the higher brain centers (which presumably direct the 
“lying”), leave the guilty one in an uninhibited state in 
which his responses to questioning will be the natural, 
truthful ones. The drug scopolamine has been utilized 
for this purpose; it is a drug often used in obstetrical 
anesthesia. It is probable that babblings during such 
anesthesia, of things ordinarily not told, led to the trial 
of the drug in detections of guilt. The following story. 



Physiological Measurements of Emotion 439 

with names omitted, depicts one of the more publicized 
cases in which the method was employed: 

One morning ’s men were called to a room in 

a city apartment hotel, where they found on a bed the 
naked corpse of a woman with cuts and bruises on her 
face, tooth marks on her body, blood on the pillow. 

Her clothing, torn in strips, was strewn on the floor. 
Identified as Mrs. F. H., she was found by the coroner 
to have died of a heavy blow on the head. She had 
been dead about twelve hours, but a woman’s voice had 
answered the telephone in the room only two hours be- 
fore the body was found. Perplexed, the policemen 
hunted the man with whom Mrs. H. had been living, a 
butcher named F. F. said he had not seen his mistress 
for two days, had spent the time in “niglit clubs.” He 
had obviously been drunk. Asked if he would take 
scopolamine to refresh his memory, he agreed. 

F. was taken to a hospital, and given four hypoder- 
mic injections of 1/150 grains each. For three hours he 
was incoherent. Then he revealed that he had downed 
several drinks with Mrs. H., tried to quiet her when she 
became boisterous, struck and choked her, tore her 
clothes. When she fell, struck her head against a metal 
bedpost, and lost consciousness, F. tried to revive her by 
biting her; he failed, remained several hours, then de- 
parted. What he did not tell while under the influence 
of scopolamine was that he had imitated a woman’s 
voice when he answered the telephone. He was held 
for first-degree murder.^ 

Opinions range from very favorable ones which view 
such methods as capable, with extended skillful use, of 
eliminating much “third degree” brutality, to very adverse 
opinions which regard such procedures as unjust and 
unethical, and a defiance of human rights. These, of 
course, are ethical questions and have little bearing upon 

» Time, Vol. XXVI, No. 21, November 18, 1935, p. 55. 



440 Physiological Measurements of Emotion 

the scientific efiicacy of such a method, which remains 
yet to be demonstrated thoroughly. 

VIII. Summary 

Throughout this chapter we have noted much which 
emphasizes that our knowledge of the physiology of the 
emotions is incomplete at present. The rapid progress 
that is being made today in the fields of endocrinology 
and biochemistry give us hope that the time is not far 
off when we shall know more about the factors we have 
been discussing. The rapidity of the progress will de- 
pend considerably upon the cooperation between psy- 
chologists and physiologists. The one group is inclined 
too often to talk glibly about the role of the endocrines 
and other physiological mechanisms in emotions without 
much definite knowledge as a basis; and the other group 
is often just as careless in dismissing without careful in- 
vestigation the effects and physiological manifestations 
of what he terms psychic factors. Progress, we repeat, 
will depend upon careful study in this common ground 
of psychology and physiology. And with this progress 
we may expect the quantitative estimate of emotional 
states through their physiological manifestations to be- 
come more accurate and more practical. 



CHAPTER XXVII 


Tests in the Motor and Sensory Fields 


M otor and sensory tests of the type to be discussed 
in this chapter have an importance to psychology 
for several reasons. First of all, they possess great in- 
terest from a historical standpoint. In one of the first 
chapters of this book, we noted the use of sensori-motor 
tests in the early investigations of individual differences. 
Many of these earlier tests have been admirably sum- 
marized and directions for their application given by 
^^liipple ill his Manual of Mental and Physical Tests, 
published in 1914. Motor and sensory tests hate long 
interested the psychologist as a means of studying the 
relation between physical and mental characteristics. 
Motor and sensory deficiencies markedly affect social and 
emotional adjustments, achievement in school, and pro- 
ficiency in work. Hence, the importance that attaches 
to them in the problems of clinical testing, of vocational 
and educational guidance. 

The tests described in this chapter are selected from a 
large number of available motor and sensory tests. They 
have been selected because they are representative of the 
field, because they measure rather broad aspects of motor 
or sensory efficiency, or because they have been of partic- 
ular interest or value to the psychologist.^ In nature and 


tFor more extensive discussions of such 
Tests, 


441 



442 Tests in the Motor and Sensory Fields 

general purposes, they overlap some of the purely physi- 
ological measurements described in Chapters XXIII and 
XXVI. The reader may in some instances wish to con- 
sider all these tests together. 

1. Cephalic index. This is an index of head shape, 
calculated from measurements of head width and length 
made with a pair of head calipers (see Fig. 32) : 

100 X Width 

Cephalic Index = 

Length 

It has been customary to designate as dolichocephalic 
the “long-headed” with an index of below 75; as meso- 
cephalic the “medium-headed” with an index of 75 to 
80; and as brachycephalic the “broad-headed” with an 
index of above 80. 



The cephalic index has interested chiefly the anthro- 
pologist in his search for measurements differentiating 
racial groups, and the psychologist in his search for physi- 
cal measurements reflecting mental ability. In the early 
sensori-motor stage of ability testing, head measurements 
were among the favorite physical tests. Before the de- 
velopment of his intelligence scale Binet investigated the 



Tests in the Motor and Sensory Fields 443 

possibilities of cephalic measurements as means of indi- 
cating intelligence quantitatively; in fact, the greater 
part of Binet’s experimental output during the years 
1901 and 1902 deals with cephalic measurement. His 
conclusions are based primarily upon measurements of 
about 250 children of ages 11 to 13. His subjects were 
selected on the basis of estimates of their ability made 
by their instructors, and were estimated as decidedly 
superior or inferior. His average head measurements 
favored, so far as head size is concerned, the intelligent 
group as compared with the unintelligent, but the dif- 
ferences were small and the overlapping great. Binet 
himself admitted at the time of his study that his head 
measurements could not be regarded as safe tests of in- 
telligence for purposes of individual selection, and that 
except for the very extreme measures, head size and 
shape did not constitute satisfactory indicators of intel- 
lectual capacity. 

Another of the early studies, similar to Binet’s, is that 
of Pearson. He correlated head measurements with 
teachers’ estimates of intelligence for over 5,000 subjects, 
including a group of college students, a group of twelve- 
year old boys, and a group of twelve-year old girls. The 
correlations for these groups were, respectively, — .06, 
— .04, and -1-.07. 

More recent studies of this same relationship, made 
after the advent of intelligence tests, substantiate the 
results of the earlier studies in which intelligence was 
only estimated. Most of the reported correlations be- 
tween cephalic index and intelligence test measures are 
below .10. 

2. The strength tests. Various strength tests have 
been used in psychological and physiological studies. 
The studies have had widely different purposes. Some 



444 Tests in the Motor and Sensory Fields 

have aimed at evaluating athletic and gymnastic ability; 
some have depicted growth and maturation in physical 
capabilities; some have aimed at measuring physical 
traits important in industrial and vocational perform- 
ances; and many conducted in the psychological labora- 
tory have aimed at finding the relationships between 
strength and other traits and abilities. 

One of the most popular of the strength measurements 
in the psychological studies is that of measurement of 



Fig, 33. — Dynamometer for Measuring Hand Strength. 

hand strength, made by use of the hand dynamometer. 
(See Fig. 33.) The subject grips the two handles in such 
a way that the second phalanges of the fingers press 
against the inner handle. The strength of the grip is 
indicated in kilograms on the dial. Norms for strength 
of grip on such an instrument are given in Table XXIX. 

A number of investigations have been made of the re- 
lationship between strength of grip and mental ability. 



Tests in the Motor and Sensory Fields 445 

With children the correlations reported have usually been 
positive. Johnson ^ reports a correlation as high as .71 
between strength of grip and mental age for 262 children 
between the ages of 3 and 13. However, with older 
groups, and with the younger groups when age is held 
constant, the correlation is near zero. 

Table XXIX 

NORMS FOR STRENGTH OF GRIP (IN KILOGRAMS) « 



Right 

-Boys 

Left 

Right 

-Girls 

Left 

Age 

Hand 

Hand 

Hand 

Hand 

6 

9.21 

8.48 

8.36 

7.74 

7 

10.74 

10.11 

9.88 

9.24 

8 

12.41 

11.67 

11.16 

10.48 

9 

14.34 

13.47 

12.77 

11.97 

10 

16.52 

15.59 

14.65 

13.72 

11 

18.85 

17.72 

16.54 

15.52 

12 

21.24 

19.71 

18.92 

17.78 

13 

24.44 

22.51 

21.84 

2039 

14 

28.42 

26.22 

24.79 

22.92. 

15 

33.39 

30.88 

27.00 

24.92 

16 

39.37 

36.39 

28.70 

26.56 

17 

44.74 

40.96 

29.56 

27.43 

18 

49.28 

45.01 

29.75 

27.6b 


The dynamometer has been used in several studies of 
fatigue. Fatigue ‘‘curves'' have been constructed from 
successive dynamometer records, and their characteris- 
tics studied. In the investigation of fatigue recently 
sponsored by the Society of Automotive Engineers* the 
hand dynamometer was studied as a possible means of 
measuring degrees of fatigue. It was finally discarded 


2 Johnson, B. J., Mental Growth of Children in Relation to Rate of 
Growth of Bodily Development, E. P. Dutton and Co., New York, 1925. 

2 Whipple, G. M., Manml of Mental and Physical Tests, Simpler 
Processes, Warwick and York, Baltimore, 1914. 

* Journal of Society of Automotive Engineers, Vol. XXVI, No. 4, 
April 1930. 



446 Tests in the Motor and Sensory Fields 

in these studies, since strength of grip showed no definite 
variation with relatively mild degrees of fatigue. 

In a recent study ° of handedness the hand dynamom- 
eter was utilized as one basis for determination of de- 
gree of handedness — ^the degree of handedness being taken 
as the ratio of performance by the right as compared with 
the left hand. 

3. Tests for sensory acuity. Sensory efficiency, espe- 
cially in hearing and vision, is of such great importance 
in our daily lives that it is only natural that standard 
instruments should have been developed to measure it. 
Measurements are often made by the ophthalmologist or 
the otologist, but from a very early period psychologists 
have been interested in sensory studies. 

In the psychological studies sensory capacity is com- 
monly measured by two general methods. By the first 
we find the least stimulus which can be perceived by the 
subject; this minimum constitutes the absolute thresh- 
old or limen of sensitivity. The dimmest light that can 
be perceived constitutes the absolute threshold of vision ; 
the faintest sound that can be heard, the absolute thresh- 
old of hearing. By the second method we find the 
smallest difference between two stimuli that can be dis- 
tinguished by the subject. This difference constitutes 
the just noticeable difference or the differential threshold. 
To illustrate, if we are investigating pitch sensitivity in 
hearing tones, we determine the smallest difference in 
pitch that the subject can distinguish between two tones 
Absolute threshold and differential threshold are not 
perfectly correlated. We do not always find low dif- 
ferential thresholds in those with low absolute thresholds. 
Absolute threshold may be said to be a measure of gen- 

» Roos, Maty M., Unpublished Thesis, The George Washington Uni- 
versity. 



Tests in the Motor and Sensory Fields 447 

eral sensory eflBciency; differential threshold is particu- 
larly related to matters of fine sensory appreciation. 
Seashore has stressed such measurements in hearing in 
his studies of musical talent. 

Auditory acuity is commonly measured approximately 
by “voice tests,” “watch tests,” or “tuning fork tests,” 
in which records are made of the distances at which the 
various sounds can be heard. These methods are obvi- 
ously hard to standardize, and the results are, therefore, 
likely to be rather inaccurate. More accurate measure- 
ments of auditory acuity make use of audiometers, in- 
struments designed for accurate production of sounds of 
varying loudness or intensity. One of the best of these 
is the audiometer designed by Seashore. The stimuli 
consist of series of clicks of graduated intensity, which 
are heard through telephone receivers. Space does not 
permit here a detailed account of this instrument. 

Visual acuity is most frequently measured by a stand- 
ard set of test cards from which the subject reads letters, 
numbers, or designs at a given distance. The most fa- 
miliar device is the Snellen Chart. Rows of letters vary- 
ing in size are printed on the chart and are to be read by 
the subject. Given on the chart is the distance at which 
the normal eye should be able to read the letters of each 
row. Visual acuity is expressed as a fraction, the nu- 
merator of which is the distance the subject stands from 
the chart (usually 20 feet), and the denominator of which 
is the distance value of the smallest letters that can be 
read. For example, if a person can read at 20 feet the 
letters normally readable only at 15 feet his vision is 
20/15 (better than normal) ; if he can read at 20 feet no 
smaller letters than can normally be read at 30 feet, 
his vision is 20/30 (poorer than normal). 

Other standard visual acuity tests include the Ewing 



448 Tests in the Motor and Sensory Fields 

Test, a series of charts designed to avoid some of the lim- 
itations of the Snellen Test; and the McCallie Tests, a 
series of charts designed especially for testing children 
and illiterates. 

4. Color blindness. Color blindness is an interesting 
defect in the sensory field that has been the subject of a 
number of psychological studies. There are various 
forms of color blindness, the commonest of which is red- 
green blindness, in which these two colors are confused 
with each other and with other colors of like brightness 
and saturation. Rarer forms of color blindness are the 
yellow-blue and the total forms. 

Two standard tests for color blindness are the Holm- 
gren Woolens Test and the Ishihara Test. In the former 
the subject must pick out from numerous small skeins 
of wool those which resemble in color three large skeins 
which are green, rose, and red in color. Errors in pick- 
ing out resembling colors indicate color blindness. The 
Ishihara Test consists of sixteen color plates designed so 
that the normal person can easily distinguish numbers 
on certain of the plates, whereas the color-blind person 
has difficulty in distinguishing the numbers or fails to 
see them altogether. 

5. Steadiness tests. Steadiness tests have been popu- 
lar in the psychological laboratory for many years. They 
have been employed in studies of individual differences 
and in studies of age and sex differences. They have 
been used in studying the effects of fatigue and of drugs 
such as alcohol, tobacco, and caffeine. 

One of the oldest and most commonly used of these 
tests is the hand steadiness test designed to measure 
muscular control and control of tremor. One design of 
this test worked out and standardized by Swope of Pur- 
due University and used in a series of automobile driving 



Tests in the Motor and Sensory Fields 449 

tests is illustrated in Fig. 34.® The subject takes the 
test by inserting the stylus successively in each hole, 
beginning with the largest and moving forward at the 
rate of about one hole per second. The stylus is pushed 
to a back stop at the depth of an inch, the subject being 
careful not to touch the brass plate with the stylus. 
Should the edge of the hole be touched with the stylus, 








/ 




Fig. 34.— Instrument for Testing Hand Steadiness. The instru- 
ment is hung in a rigid position about 36 in. from the floor of the car. 
Its position requires that the operator sit forward on the seat and thus 
have no support for his back. 


a buzzer inside the cabinet rings, and the number below 
the hole where contact is made is recorded against the 
subject’s score. 

The Moss WabbUmeter is a recently designed instru- 

® Reproduced by courtesy of Ammon Swope and the Journal of the 
Society of Automotive Engineers, 




Fig. 35.— Moss Wabblemeter for Measuring General Bodily 
Steadiness. The platform on which the subject stands moves in a 
vertical direction upon a central pivotal point as the weight is shifted 
from one foot to the other and from heel to toe. This motion is total- 
ized and registered on two counters. The count increases with fatigue. 


450 


Tests in the Motor and Sensory Fields 451 

ment for measuring general bodily steadiness. The in- 
strument was designed in connection with a research 
study of fatigue produced by automobile driving, spon- 
sored by the Society of Automotive Engineers, and was 
utilized as a means of indicating degree of fatigue. Fig. 
35 indicates the nature of the Wabblemeter. The 
platform on which the subject stands moves upon a 
central pivotal point as the weight is shifted from one 
foot to the other and from heel to toe. This motion is 
totalized in the instrument and registered on two 
counters. 

Moss has reported a study of diurnal variations in 
steadiness as measured by this machine, and a number 
of studies of effect of fatigue produced by automobile 
and airplane riding and driving upon steadiness.^ The 
diurnal variations are shown in Fig. 36. There is a 
fairly general opinion that people are much steadier in 
the morning that they are in the middle of the after- 
noon; this seems to be far from the truth in this study. 
As a matter of fact, the most unsteady records are those 
of the early morning hours. The eight o’clock morning 
records were on the average about 60 per cent higher than 
the records at two in the afternoon. The increase in 
steadiness over part of the day probably has much of 
the nature of a “warming-up” period, such as the base- 
ball player might find necessary. Such a study of diurnal 
variations is of importance in interpreting fatigue rec- 
ords or records taken for other purposes throughout a 
day. Moss’s studies of fatigue from travel in various 
types of vehicles show progressive decreases in steadiness, 
the actual amount of decrease being related to type of 

TSee Journal of the SociUy of Automotive Engineers. Vol. XXVIII, 
No. 6, May 1931. 



452 Tests in the Motor and Sensory Fields 

vehicle, speed of traveling, t 3 rpe of road, and weather 
conditions. 

6. Reaction time. Reaction time is a measure indi- 
cating the speed with which we respond to stimuli, the 
term ordinarily being applied in those situations calling 
for immediate responses. This speed is dependent pri- 



A./V\. M. PJVS. 

Fig. 36.— Diurnal Variations in Steadiness. 


marily upon neurological factors of our makeup — the 
speed with which stimuli alfect sensory nerve endings, 
the speed with which neural impulses are conducted, and 
the speed with which synaptic connections are made. Re- 
action times vary with the stimuli and with the responses 
to be made. They may be very short (only a few 


Tests in the Motor and Sensory Fields 453 

thousandths of a second) for simple motor responses to 
single, elementary stimuli, as a blink of the eye to a visu^ 
stimulus. They may be considerably longer where a 
more complex response is to be made; or where a choice 
of responses is involved, as in the tapping of one key if a 
red light is given and another key if the stimulus light is 
green. The reaction time for the same response to the 
same stimulus is not the same for all individuals. In 
fact, marked individual differences occur in reaction times. 

Theoretical interest in reaction time measurements has 
been in evidence for a long time. Studies in this field 
were favorites in early psychological laboratories and in 
the early psychological studies of individual differences. 
Neurologists also concerned themselves with studies of 
reaction time, particularly in its relationship to neural 
conduction. Studies along this line were reported by 
Helmholtz as early as 1850. 

With the growth of applied psychology, emphasis has 
been placed upon reaction time as an important element 
in many practical, everyday situations. The ability to 
drive an automobile safely, to react promptly in the 
emergency situations of driving, is directly related to 
one’s speed of reaction. In industry we find some men 
attempting to operate machines that require faster reac- 
tions than they are able to make, while others are operat- 
ing machines that do not demand their best in speed of 
reaction. This lack of agreement between the demands 
of the job and the ability of the man doing the job is an 
important cause of fatigue and boredom felt by workers. 
The individual learning to operate a typewriter must 
possess a certain minimum reaction time in motor re- 
sponse and in eye-hand coordination. All these examples 
emphasize the importance of reaction time in practical 
affairs and should point out to us the value of reaction 



454 Tests in the Motor and Sensory Fields 

time measurements in relation to accident prevention, 
placement in industry, and vocational guidance. 

During the past few years there have been several at- 
tempts to develop a reaction time instrument which 
would be of especial value in testing reaction time as it is 
involved in operating an automobile. Most of the at- 
tempts have aimed at developing a means of measuring 
the time it takes one to make the response of putting on 
the brake at a given signal. Since these instruments 
possess a greater practical value than many of the earlier 
instruments, we shall examine one or two of them as 
examples of measurement in the field of reaction time. 

One of the earliest means of measuring the reaction 
time involved in putting on the automobile brake at a 
given signal was worked out by Moss and Allen. Their 
measurements were made by having the individual move 
his foot from the accelerator to the brake at an auditory 
stimulus. The stimulus was produced by the firing of a 
gun attached beneath the running board of the car in 
such a fashion that at the moment the gun was fired a 
mark of red lead was made on the road. When the 
driver pressed the brake pedal, more red lead was re- 
leased and made another mark on the road. By main- 
taining a constant speed in the car and measuring the 
distance between the two red marks on the road, the time 
elapsing between the signal and the brake application 
could be calculated. 

More recently, in connection with experimental work 
carried on for the Society of Automotive Engineers, Moss 
and Brown have developed another instrument for meas- 
uring reaction time. This instrument simulates the 
automobile situation in that the reaction to be made is 
the moving of the foot from one pedal representing the 
accelerator to another pedal representing the brake pedal. 



Tests in the Motor and Sensory Fields 455 

although the actual test is carried on in the laboratory. 
Reaction is made to a visual stimulus produced by a light 
in a bulb in front of the subject. This instrument is 
illustrated in Fig. 37. 



Pig. 37«— Moss-Brown Reaction Time Instrument 




456 Tests in the Motor and Sensory Fields 

Rackley * used this instrument in a series of studies of 
reaction time as related to various other factors. Some 
of his findings are fairly typical of results which have 
been obtained in various reaction time studies. 

He studied the relation of reaction time to mental 
ability, using a group of university students, a group of 
high school students, and a group of inmates of the Na- 
tional Training School for Boys in Washington, D. C. 
The first two groups were measured in mental ability by 
the use of group intelligence tests; the third group was 
tested by the Stanford Binet Scale. The correlations 
between intelligence test records and reaction times are 
given in Table XXX. It will be noticed that with the 

Table XXX 

CORRELATIONS BETWEEN INTELLIGENCE AND 
REACTION TIME 


Group 

Correlation 

P.E, 

University 

+.13 

.03 

High School 

+.23 

.06 

Training School 

+.55 

.05 


university students the correlation is lowest, being close 
to zero ; w’hile with the Training School group there is a 
significant positive correlation. The difference in rela- 
tionship in the three groups is probably related to the 
homogeneity or heterogeneity of the group. The group 
showing the lowest correlation is the homogeneous, that 
is, the most alike in intelligence; whereas the group show- 
ing the highest correlation is the most heterogeneous, or 
widely different in intelligence. The National Training 
School group ranged from an I.Q. of 39 to an I.Q. of 113. 
The university group ranged from 90 to 125 I.Q., a dif- 
ference in spread of the two groups of 39 points. From 

® Rackley, Lloyd Ernest, An Experimental Study of the Fajctors of 
Reaction Time as Exhibited by a Cross-Section of the Population of 
Washington, D. C; unpublished thesis, The George Washington Uni- 
versity. 



Tests in the Motor and Sensory Fields 457 

such a study we may conclude that in an unselected 
population there is a moderate positive correlation be- 
tween reaction time and intelligence. In a selected 
group, particularly one selected from the upper levels of 
intelligence, there is practically no relationship between 
reaction time and intelligence. 

The effect of practice on reaction time was studied on 
147 subjects altogether. One group of these subjects 
took 15 trials each, another group took 30 trials each, 
and a third group took 20 trials a day for 20 days. Rack- 
ley concludes from his study of these groups that with 
practice there is a small but definite reduction in reaction 
time. From this conclusion we might infer that long re- 
action times can be shortened somewhat by attention 
directed specifically toward practicing the act involved. 
The amount of reduction, however, is too small to indi- 
cate that such practice procedures would be of much 
avail in the practical situations in which an individual’s 
reaction time may be too long. 

Among the other results of Rackley’s investigations, 
with a brief conclusion about each, are the following: A 
slight difference in the reaction time of white and negro 
subjects was found, the difference being in favor of the 
whites. No significant difference in reaction time of 
experienced and inexperienced automobile drivers was 
found. An initial reduction in reaction time following 
severe physical exercise was found. A small but sig- 
nificant difference in the reaction time of the two sexes 
was indicated, in favor of men. Some variation in aver- 
age reaction time at different hours of the day was found, 
longer reaction times tending to occur very early in the 
morning as compared with trials taken later in the day. 
It is interesting to note that this conclusion agrees with 
the findings which we quoted from the studies of steadi- 
ness. 




INDEXES 




Subject Index 


A. 

Accident prevention, 306-309,393, 
454 

Accomplishment quotient, 223 
Achievement : 

measurement, 4, 40, 207-239 
college, 217-219 
elementary school, 213-214 
high school, 214-217 
professional schools, 227-239 
tests, 207-239 
comprehensive, 220-222 
examples, 213-223 
nature, 207, 223^226 
psychology’s contribution to, 
208-213 

Activity interests, 200 
Adaptability tests, 290 
Adaptation board, 75 
Adjustment questionnaire, 358 
Adjustment tests, 358 
Admission to college, tests in, 120- 
121 

Adrenalin, 434-436 
Adults, measurement of intelli- 
gence in, 107 
Age scales, 65-72 
American Council Psychological 
Examination, 107 
American Psychological Associa- 
tion, 35 

Annoyance teat, 365-367 
Aptitude tests, 15, 40, 143-204 
engineering, 173 
law, 173 

medical schools, 174-186 
professions, 172-186 
teaching, 173-174 

Army Alpha Test, 18, 36-37, 42- 
49, 80-81, 105 

Army Beta Test, 36-37, 60-56, 77 
Army Performance Test, 37, 76 
Army testing, 37, 266, 268 
Arteriosclerosis, 409 
Art tests, 153-161 
Ascendance-Submission Scale, 330 


A-S Reaction Study, 330 
Assembly tests, 167, 168 
Association of American Medical 
Colleges, 175, 185 
Association tests, 115-117, 200- 
201, 368 

Attitude tests, 388-389 
Automobile driving, tests for, 448- 
454 

B 

Bacteriology test, 229-239 
Basal metabolism, 400, 401, 408, 
418 
Behavior : 
control, 10 
prediction, 10 
testa, 373, 380-386 
understanding, 10 
Binet Scale {see Binet Test) 
Binet-Simon Test (see Binet Test) 
Binet Test, 18, 24, 27-^3, 36, 65- 
72, 110-113, 203-204 
Blood cell measurements, 400, 405 
Blood pressure, 388,398,409,428- 
430 

Blood sugar, 399, 406, 419 
Bookkeeping tests, 250 
Boston Elevated Railway, tests 
used in, 306 

Boston Psychopathic Hospital 
Memory Test, 114 
Brachycephalic, 442 
Breathing, measurement of, in emo- 
tions, 431-432, 438 
Bricklaying, motion study in, 310 

C 

Carbon-dioxide combining power of 
blood, 399, 406 
Casuist board, 74 
Cephalic index, 442-443 
Character education, 373, 386 
Character Education Inquiry, 375, 
379 



462 Subject Index 


Cheating tests, 381-386 
Chemistry tests, 217-219, 251 
Civil Service Commission, tests of, 
107, 132, 266-271, 286 
Classification questions, 213 
Classification tests, 220-222 
Clerical tests, 290 
Coefficient of correlation, 21 
College Entrance Examination 
Board, examinations of, 224 
College grades, relation to intelli- 
gence tests, 122-123 
College intelligence tests, 104 
College of Physicians and Sur- 
geons, objective teats in, 229 
Colloidal gold test, 407-408 
Color blindness, 448 
Completion questions, 212 
Composition scale. Willing’s, 220 
Comprehensive achievement tests, 
220-222 

Conduct knowledge teats, 373, 375- 
380 

examples, 376-379 
value, 379 

Consonance, measurement of, 145, 
147 

Constancy of mental ability, 137- 
138 

Correlation, 21 
Cretin, 62 

Criminology, tests in, 133-135, 
371, 372 

Criteria, test, 21, 284-285 
Critical scores, 279 
Cube test, 75, 76 


D 


Dearborn Tests, 79 
Deception tests, 381-386, 436-440 
Deficiency, measurement of, 59-81 
Delinquents, use of intelligence 
tests with, 133-135 
Dementia praecox, 112, 117, 363, 
409 

Department store management, 
tests in, 311-317 

Descriptive rating scale, 327-328 
Design tests, 77-79 
Deterioration, intellectual, 110, 112 
Diabetes, 406, 419 
Diagonal test, 75 


Digestive-system changes in emo- 
tions, 432-433 
Digit-symbol tesc, 76 
Disciplinary cases, use of tests in, 
120 

Dishonesty, tests of, 381-386 
Dislikes, 187-189 
Dolichocephalic, 442 
Dynamometer, 444 

E 

Educational age, 222 
Educational quotient, 222 
Efficiency : 

measurement, 240-253 
ratings, 241 

Elemente der Psychophysics 26 
Emotional adjustment, measure- 
ment of, 358-362 
Emotionality score, 369 
Emotional maturity scale, 330 
Emotional reactions, measurement 
of, 4 

Emotional tests: 
physiological, 424-440 
verbal, 357-371 
Employees : 
selection, 124-133 
tests in control, 303-318 
Employment tests, 273-302 
Endocrine tests, 410-423 
Energy, 185 

Entrance credits, medical school, 
184 

Entrance tests, college, 120-121 
Epilepsy, 112 
Ergometer, bicycle, 397 
Ethical discrimination test, 368, 
375 

Experimental psychology, 25 
Experimental Study of Intelli- 
gences 30 

Extroversion, measurement of, 
362-365 


P 


Fatigue : 
in industry, 262 
measurement, 393-402 
Feature profile, 75-76 



Subject Index 463 


Feeble-mindednesR (see Mental 
deficiency) 

Form board test, 72 


G 


Genius (see Superiority) 

George Washington University Se- 
ries tests, 105, 214, 217, 228 
Glands : 
adrenal, 411 
gonads, 411 

importance in psychology, 410 

pancreas, 412 

parathyroid, 412 

pituitary, 411 

sex, 411 

thyroid, 411 

Glandular function testa, 410-423 
Grades, relation to intelligence 
teats, 122-124 

Graphic rating scale, 328-329 
Group tests, intelligence, 32-^, 
41, 80-81 


H 


Haggerty Intelligence Tests, 39, 79 
Haggerty Reading Examination, 
213-214 

Handedness, 446 
Handwriting scale: 

Ayres, 220-221 
Thorndike, 220 
Head measurements, 442-443 
Healy Puzzle Test, 75 
Hemoglobin tests, 405 
Hereditary GeniuSy 86, 138 
Heredity, 138-139 
History of measurement, 25-41 
Background Period, 25-28 
Binet Period, 28-32 
Maturing Period, 39-40 
Post-War Period, 32-34 
Preliminary Period, 32-34 
Present Period, 41 
World War Period, 34-37 
Holmgren Woolens Test, 448 
Honesty tests, 332, 381-386 
Hormone tests, 418-422 
Humor, test for, 344 
Hydrocephalic, 62 


I 

Identification questions, 212 
Idiosyncrasy score, in emotions, 
369 

Idiot, 61 
Imbecile, 61 
Individual tests, 65-72 
Industry, measurement in, 257-268 
during the World War, 265-266 
early experiments, 262-2G5 
foundations, 257-262 
testing of applicants, 206-271 
Information tests : 
interest in agricultural engineer- 
ing, 196 

interests, 196-199 
mechanical interest, 197 
play interests, 199 
social interests, 197 
vocational interests, 198 
Insane, tests for, 109-117, 373, 
403-409 

Inspiration-expiration ratio, 431, 
438 

Insulin, test for, 419 
Intelligence : 
constancy, 137-138 
distribution, 92, 94 
measurement : 
in deficiency, 59-81 
in superiority, 82-108 
quotient, 19, 69-70 
tests, 4, 15, 23-41, 59-139, 268- 
269 

college, 104-107, 119-123 
for adults, 107-108 
for deficiency, 65-81 
group, 32-34 
industry, 268 
of the insane, 109-113 
relation to college grades, 122- 
123 

research, 136-139 
schools, 95-1(4, 123-124 
specialized, 269 
uses, 118-139 

vocational selection, 124-133 
with delinquents and crimi- 
nals, 133-135 

Intensity, measurement of, 145-14^ 
Interest inventories, 180-195 
examples, 189 
scoring, 193 



464 Subject Index 


Interest inventories {Cont ,) : 
selection of items for, 193 
Strong*s Inventory, 189 
validity, 194 
Interests, 187-204 
association tests for measuring, 
200 

children's, 196 

information tests for measuring, 
196 

inventories for measuring, 189 
objective, 188 

questionnaires for measuring, 
189 

rating scales for measuring, 195 
subjective, 187 
vocational, 189-196 
International Intelligence Test, 78, 
97-101 

Interne ratings, 182-183 
Interviews for medical students, 
183, 184 

Introversion, measurement of, 362- 
365 

Inventories, interest («€e Interest 
inventories ) 

Involution melancholia, 363, 414 
Iowa High School Content Exam> 
ination, 222 
Ishihara Test, 448 

J 

Job analysis, 275-276, 282 
Job efficiency, measurement of, 
240-263 

psychological tests, 250-253 
service ratings, 241-250 

K 

Kahn Test, 404 

Kent-Rosanoff Association Test, 
116-117 

Kymograph, 426-427 
L 

Laboratory, psychology, 27 
Laboratory tests in mental disor* 
ders, 403-409 
Lange’s test, 407-408 
Lie detector, 436 
Likes, 187-189 
Limen of sensitivity, 446 


Literature test, high school, 214- 
217 

Lying, tests for, 381-386, 436-440 
M 

MacQuarrie Test for Mechanical 
Ability, 169, 171 

Macy Department Store tests, 311- 
317 

Mania, 363 

Manic depressive psychosis, 117 
Manikin test, 75, 76 
Man-to-man rating scale, 325-326 
Mare and Foal Board, 73 
Marietta Apparatus Company, 169 
Matching test, 212 
Maze tests, 7^79 
McAdory Art Test, 157-161 
McCallie Tests, 448 
Measurement : 
character, 372-389 
development, 6 
emotions, 357-^371, 424-440 
fatigue, 393-402 
history, 23-66 

in control of employees, 303-318 
in everyday life, 6 
in higher education, 228 
in motor and sensory fields, 441- 
454 

in physical sciences, 13 
in psychology, 10 
in selection of employees, 280- 
302 

interests, 187-204 
morality, 372-389 
origin, 6 

personality, 321-389 
problems, 11 
reaction time, 452-454 
social attributes, 334-^56 
steadiness, 448-452 
teaching efficiency, 251-253 
value, 3-12 
what is, 3 

Mechanical aptitude: 
measurement, 162-171 
tests, 15, 40, 167, 169, 171, 197 
Mechanical engineering tests, 228 
Mechanical interest tests, 197 
Medical aptitude test, 174-186 
development, 174-176 
nature, 175-180 

relation to interne ratings, 182- 
183 



Subject Index 465 


Medical aptitude test (Cont.) 
relation to medical school grades, 
180-182 

sample questions, 176-180 
validity, 180-186 

Medical Examinations, State 
Board, 175 
Medical school: 
admission to, 174-186 
grades, 180-182 

prediction of success and failure 
in, 181-184 

Meier-Seashore Art Judgment 
Test, 154-157 
Memory : 

for names and faces, 335, 342 
tests, 114-115 

Mental age, 32, 59-60, 67, 69 
Mental Alertness Test, 1C5-106 
Mental deficiency : 

among criminals and delinquents, 
134-135 

definition, 59-60 
examples, 71-72 
measurement, 59-81 
problems, 62-63 
tests, 63-81 

Mental disease, glands in, 414-416 
Mental disorders, teats in, 403- 
409 

Mental growth, 136-137 
Mental Hygiene Inventory, 35-39 
Mental Hygiene Teat, Colgate, 359 
Mental tests {see Intelligence 
tests) 

Mesocephalic, 442 
Metabolism, 400, 401, 408, 418 
Microcephalic, 62 
Minnesota Assembly Test, 169 
Minnesota Mechanical Ability 
Tests, 167-169 

Minnesota Paper Form Board, 
168-169 
Moron, 61 
Motility, 321 
Motion study, 309-311 
Motormen, tests for, 263-265, 306 
Motor tests, 441-454 
Multiple-choice questions, 211 
Musical aptitude: 
effect of training on, 153 
inheritance, 153 
measurement, 144-153 
racial studies, 153 
Musical memory, measurement of, 
145, 147 


N 

National Intelligence Test, 39, 81, 
95-98 

Neurasthenia, 359, 409 
Norms, 279 

Numerical rating scale, 326-327 
O 

Occupations, intelligence limits for, 
128-130 

CEstrous state, 420 
Opinions tests, 386, 386-389 
O’Rourke Mechanical Aptitude 
Test, 170-171, 197 
Otis Intelligence Tests, 38, 101- 
104, 169 

Ovarian hormone test, 420 
P 

Parathyroid gland, 420 
Paresis, 62, 112, 401 
Passing points, 279 
Patrolmen, tests for, 270 
Percentile rating, 19 
Performance : 
scale : 

Army, 76 

Pintner-Paterson, 73-75 
tests, 72-77 

Personal Data Sheet, 358 
Personal Inventory, Laird’s, 363- 
365 

Personal Traits Rating Scale 363 
Personality : 

difficulty of measuring, 322-325 
glands in, 413-415 
inventory, 360 
measurement, 321-389 
nature, 321 
schedule, 359 

Personnel selection, tests in, 124- 
133, 286-302 

Physiological measurement, 25, 
393-4:54 
character, 387 
emotions, 424-440 
fatigue, 393-402 
Picture arrangement test, 76 
Picture completion tests, 75, 76 
Picture tests, 77-79, 167 
Pig iron experiment, 259 
Pintner-Cunningham Test, 70 



466 Subject Index 


Pintner - Paterson Performance 
Scale, 73-75 

Pitch, measurement of, 145, 146 
Play interests test, 199 
Policemen, tests for, 270, 290 
Porteus Test, 77-79 
Postal tests, 290-302 
practicability, 295 
reliability, 2^ 
samples, 296-302 
validity, 291 
Pregnancy tests, 422 
Premedical scholarship, 184 
Pressey X-0 Test, 367-371 
Probst Service Rating Scale, 241- 
250 

characteristics, 243-245 
development, 241 
distribution of ratings, 248 
reliability, 249 
scoring, 246-247 
validity, 249-250 

Prodigies, study of, 85-86, 88-90 
Product scales, 219^221 
Professional aptitude tests, 172- 
186 

Professional school achievement 
tests (see Achievement 
tests) 

Psychasthenia, 359 
Psychogalvanic reflex, 388, 433- 
434 

Psychological laboratory, 304 
Psychological tests, 14-16 
Puzzle block test, 73 

Q 

Questionnaire, 17, 329-332, 358 
Quotient, intelligence (see Intelli- 
gence quotient) 

R 

Bating : 
percentile, 19 
scale, 16, 325-329 
interests, 195-196 
job eflficiency, 241-260 
Kaw measures, 18 
Reaction time, 10, 27, 388, 452- 
454 

Reading Examination, Haggerty, 
2ia-214 


Relative measures, 18 
Reliability, 19-21 
Research Division, Civil Service 
Commission, 266, 269 
Respiration measurements, 402 
Revue Philosophique^ 29 
Rhythm sense, measurement of, 
145, 147 

S 

Scale, rating (see Rating scale) 
Scatter, in intelligence testing, 
112-113 

Schizophrenia, 359 
Scholarship, low, use of tests in, 
120-121 

Scholastic Aptitude Test for Medi- 
cal Schools (see Medical 
aptitude test) 

School achievement tests (see 
Achievement tests) 

Scientific management, 258 
Scopolamine, 438 

Seashore musical aptitude test, 
144-153 

Seguin form board, 73, 74 
Selection of employees, measure- 
ment in, 259, 280-302 
Selective process in school groups, 
94 

Self-expression, 321 
Senile psychosis, 112 
Sensitivity, tests of, 446-448 
Sensory tests, 25-28, 441-464 
Service ratings, 241 
Service rating scale, 241-260 
Sewing scale, 220 
Sex hormone tests, 429-423 
Short-answer questions, 40, 219- 
226 

appeal, 226 

comprehensiveness, 226 
construction, 224 
development, 219-211 
difficulty, 224 
objectivity, 223 
reliability, 223 
scoring, ^5 
types, 211-213 

Short-answer tests in industry, 271 
Shorthand tests, 250 
Snellen chart, 447 
Social adaptability, 185 



Subject Index 467 


Social attributes, measurement of, 

334-356 

Social Intelligence Test, 15, 197, 
200, 332, 334, 335-351 
Sociality, 321 

Social maturity scale, 351-350 
Social quotient, 353 
Social relations test, 198 
Society of Automotive Engineers, 
448, 454 

Sones-Harry High School Achieve- 
ment Test, 222 
Special ability tests, 40 
Specialized intelligence tests, 269 
Special talents, measurement of, 
143-161 

Sphygmomanometer, 398 
Spinal fluid tests, 400-408 
Stanford Achievement Test, 222 
Stanford Revision of Binet Test 
(see Binet Test) 

Steadiness tests, 448-452 
Stenographer-typist tests, 290 
Stenquist Mechanical Aptitude 
Teats, 104-167, 171 
Street-car motormen, tests for, 
263-265 

Strength tests, 443-446 
Substitution test, 75 
Superiority, 82-108 
deflnition, 85 

development of interest in, 85 
examples, 90-01 
in school children, 87, 91-94 
Terman’s study, 88 
tests for measuring, 95-108 
Syphilitic infection, 62, 404, 407 

T 

Tabes dorsalis, 404 
Talents, measurement of, 143-101 
Taylorism, 258 
Teaching Aptitude Test, 174 
Teaching efflciency, measurement 
of, 250-253 
Temperament, 321 
construction, 273-279, 283 
questions : 

construction, 277 

scale, 330 


Temperament (Cont.) 

selection, 270 
Tests (see Measurements) 
Thorndike Intelligence Examina- 
tion, 18, 39, 100 
Threshold of sensitivity, 440 
Thyroid, 62, 409, 418 
Time and motion study, 309-311 
Time sense, measurement of, 145, 
147 

Tiredness, 394-395 
Trade interest test, 197 
Trade tests, 250, 271 
Training of personnel, 259, 262 
Triangle test, 74 
True-false questions, 211 
Truth serum, 438-440 
Typewriting tests, 250 

U 

TTnreliability of grading, 209-210 

V 

Validity, 19-21, 278 
Vineland maze tests, 77-79 
Vineland social maturity scale, 
351-356 

Vocational Interest Blank, 189 
Vocational interest test for college 
women, 198 

Vocational teats, 10, 40, 124-133, 
172-186, 189-196 

Vocation-to-vocation rating scale 
of interests, 195 

W 

Wabblemeter, 449-452 
Wasserman reaction, 404, 407 
Weber’s law, 26 

X 

X-0 Test for Emotions, 367-371 

Y 

Yerkes - Bridges - Hardwick Point 
Scale, 33, 204 




Index of Names 


A 

Allen, 454 

Allport, F. H., 331, 363 
Allport, G. W., 331, 303 
Anderson, 311 
Ayres, 220-221 

B 

Benedict, 397 
Bernreuter, 360, 362 
Bills, 130 

Binet, 28-32, 65-72, 90, 110-113, 
443 

Bingham, 284, 303, 306-309, 318 
Blatz, 431 
Book, 262 
Brace, 441 
Broom, 349 
Brown, 454 
Bryan, 262 
Burr, 133 
Burtt, 196, 311 

O 

Cady, 362 
Gannon, 432-436 
Carmichael, 82 
Carpenter, 397 
Cason, 365-367 
Cattell, 9, 10, 12, 27 
Chappell, 437 
Chave, 388 
Clendening, 414 
Clothier, 125 
Colloton, 138 
Cowdery, 189 
Cox, 87, 195 
Cunningham, 79 

D 

Dearborn, 79 
Doll, 351-356 
Drury, 406 


E 

Elliot, R. M., 167 
Elliott, E. C., 209 


F 

Farran-Ridge, 406 
Fechner, 26 
Fere, 434 
Fernald, 375 
Ferson, 173 
Filer, 286, 290 
Fracker, 153 
Freeman, 138, 374 
French, 396 

Fryer, 127-128, 187-188, 203-204 
G 

Galton, 86, 138 
Garrett, 349 
Garrison, 138 
Gilbreth, 258, 310 
Goddard, 134 
Goldstein, 421 

H 


Haggerty, 39, 79, 213-214 

Hall, 27 

Hardwick, 204 

Harris, 9 

Harry, 222 

Harter, 262 

Hartshorne, 332, 375, 379, 382, 384 

Healy, 134 

Helmholtz, 26, 453 

Hildreth, 139 

Hill, 134 

Hollingworth, 153 
Holzinger, 139 
House, 359 
Howard, 153 
Hubbard, 125, 131 
Humm, 330 



470 


Index of Names 


Hunt, 107, 252, 336, 347, 340, 351, 
306, 406 

Hunter, 228, 396 
J 

Johnson, 77, 153, 445 
K 

Kellogg, 349 
Kelly, 222 
Kent, 115-117 
Kitson, 195-196 
Kohlstedt, 363 

L 

Laird, 359, 363-^65 
Landis, 434, 437-438 
Larson, 148, 153, 437 
Lenoir, 153 
Loman, 252 

M 

MacQuarrie, 169, 349 

Marston, 437 

Mathews, 359, 361 

Mathewson, 125 

May, 332, 375, 379, 382, 384 

Mazer, 421 

MoAdory {see Siceloff) 

McGeoch, 370 
McHale, 198 
Meier, 154-157 
Merriam, 139 
Miner, 134 

Moss, 174, 185, 197, 228, 252, 270, 
335, 396, 406, 449, 455 
Miinsterberg, 263-265 
Murchison, 135 
Murdock, 220 
Murphy, 117 

N 

Neyman, 363 

O 

Omwake, 197, 214, 335 
Orr, 378 


O’Rourke, 107, 132, 170, 171, 197. 

286-291, 349 
Otis, 38, 101 

P 

Paterson, 73, 167, 434, 441 
Pearson, 443 

Pintner, 73, 79, 88, 92, 122, 134, 
136, 137, 349 
Poffenberger, 3^ 

Porteus, 77-79 
Pressey, L. W., 38 
Pressey, S. L., 38, 367-372 
Probst, 241-250 

R 

Rackley, 456-457 
Ream, 198 
Remmers, 121-122 
Roe, 396 
Ronning, 214 
Roos, 446 
Rosanoff, 115-117 
Ruch, 222 
Rugg, 138 

S 

Saam, 33 
Schwarz, 214 
Scott, 125, 262 
Scripture, 27 

Seashore, 143-153, 154, 447 
Shuttleworth, 375 
Siceloff, 154, 157-161 
Simon, 30, 65 
Sones, 222 
Stanton, 153 
Starch, 209 
Stenquist, 162-167 
Stoddard, 173 
Strang, 349 
Strong, 189-194, 203 
Swoi»€, 448-449 
Sylvester, 32-33 

Symonds, 375, 380, 385-386, 429 
T 

Tarchanoff, 434 
Taylor, 25^261 
Telford, 241, 270 



Index of Names 


471 


Tennan, 39, e&-72, 88, 90-92, 138, 
199 204 222 

Thorndike! 39,’ 106, 136, 139, 220, 
350 

Thurstone, L. L., 107, 173, 359, 
362, 388-389 
Thurstone, T. G., 359 
Tiegs, 209-210 
Toops, 119, 197 
Tredgold, 60 

U 


Unger, 133 
Upshall, 349 

V 


Vigouroux, 434 
Viteles, 257, 259 
Voelker, 375 


W 

Wadsworth, 330 
Weber, 26 
Wells, 114 

Whipple, 34, 39, 441, 445 

Whitely, 370 

Willing, 220 

Willoughby, 330 

Wood, 106, 123, 224, 228-229 

Woodworth, 358-359 

W'oodyard, 161 

Wundt, 27 

Wyman, 200-203 

Y 

Yerkes, 35, 39, 204 
Z 

Zyve, 173 






